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NOVEL METHODS OF DIAGNOSIS OF ANGIOGENESIS, 
COMPOSITIONS AND METHODS OF SCREENING FOR 
ANGIOGENESIS MODULATORS 

CROSS-REFERENCES TO RELATED APPLICATIONS 
The present application is a continuation-in-part (CIP) of co-pending United 
States Patent Application "Novel Methods Of Diagnosis Of Angiogenesis, Compositions And 
Methods Of Screening For Angiogenesis Modulators", Attorney Docket No. A651 10-1, filed 
on August 1 1, 2000, which claims the benefit of priority to U.S.S.N. 60/148,425 filed August 
1 1 , 1999, both of which are incorporated herein by reference. 

FIELD OF THE INVENTION 
The invention relates to the identification of nucleic acid and protein 
expression profiles and nucleic acids, products, and antibodies thereto that are involved in 
angiogenesis; and to the use of such expression profiles and compositions in diagnosis and 
therapy of angiogenesis. The invention further relates to methods for identifying and using 
agents and/or targets that modulate angiogenesis. 

BACKGROUND OF THE INVENTION 
Both vasculogenesis, the development of an interactive vascular system 
comprising arteries and veins, and angiogenesis, the generation of new blood vessels, play a 
role in embryonic development. In contrast, angiogenesis is limited in a normal adult to the 
placenta, ovary, endometrium and sites of wound healing. However, angiogenesis, or its 
absence, plays an important role in the maintenance of a variety of pathological states. Some 
of these states are characterized by neovascularization, e.g., cancer, diabetic retinopathy, 
glaucoma, and age related macular degeneration. Others, e.g., stroke, infertility, heart 
disease, ulcers, and scleroderma, are diseases of angiogenic insufficiency. 

Angiogenesis has a number of stages (see, e.g. , Folkman, J.Natl Cancer Inst. 
82.4-6, 1990; Firestein, J Clin Invest. 1 03 :3-4, 1999; Koch, Arthritis RheumAl:95l-62, 1998; 
Carter, Oncologist 5(Suppl l):51-4, 2000; Browder et al., Cancer Res. 60:1878-86, 2000; and 
Zhu and Witte, Invest New Drugs 17:195-212, 1999). The early stages of angiogenesis 
include endothelial cell protease production, migration of cells, and proliferation. The early 



stages also appear to require some growth factors, with VEGF, TGF-a, angiostatin, and 
selected chemokines all putatively playing a role. Later stages of angiogenesis include 
population of the vessels with mural cells (pericytes or smooth muscle cells), basement 
membrane production, and the induction of vessel bed specializations. The final stages of 
5 vessel formation include what is known as "remodeling", wherein a forming vasculature 
becomes a stable, mature vessel bed. Thus, the process is highly dynamic, often requiring 
coordinated spatial and temporal waves of gene expression. 

Conversely, the complex process may be subject to disruption by interfering 
with one or more critical steps. Thus, the lack of understanding of the dynamics of 
10 angiogenesis prevents therapeutic intervention in serious diseases such as those indicated. It 
O is an object of the invention to provide methods that can be used to screen compounds for the 
Sj ability to modulate angiogenesis. Additionally, it is an object to provide molecular targets for 
2 therapeutic intervention in disease states which either have an undesirable excess or a deficit 
CP in angiogenesis. The present invention provides solutions to both, 
f 15 

** SUMMARY OF THE INVENTION 

p The present invention provides compositions and methods for detecting or 

f5 modulating angiogenesis associated sequences. 

In one aspect, the invention provides a method of detecting an angiogenesis- 
20 associated transcript in a cell in a patient, the method comprising contacting a biological 
sample from the patient with a polynucleotide that selectively hybridized to a sequence at 
least 80% identical to a sequence as shown in Table 1. In one embodiment, the biological 
sample is a tissue sample. In another embodiment, the biological sample comprises isolated 
nucleic acids, which are often mRNA. 
25 In another embodiment, the method further comprises the step of amplifying 

nucleic acids before the step of contacting the biological sample with the polynucleotide. 
Often, the polynucleotide comprises a sequence as shown in Table 1. The polynucleotide can 
be labeled, for example, with a fluorescent label and can be immobilized on a solid surface. 

In other embodiments the patient is undergoing a therapeutic regimen to treat a 
30 disease associated with ang: a genesis or the patient is suspected of having an angiogenesis- 

associated disorder. 

In another aspect, the invention comprises an isolated nucleic acid molecule 
consisting of a polynucleotide sequence as shown in Table 1. The nucleic acid molecule can 
be labeled, for example, with a fluorescent label, 
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In other aspects, the invention provides an expression vector comprising an 
isolated nucleic acid molecule consisting of a polynucleotide sequence as shown in Table 1 or 
a host cell comprising the expression vector. 

In another embodiment, the isolated nucleic acid molecule encodes a 
5 polypeptide having an amino acid sequence as shown in Table 2. 

In another aspect, the invention provides an isolated polypeptide which is 
encoded by a nucleic acid molecule having polynucleotide sequence as shown in Table 1. In 
one embodiment, the isolated polypeptide has an amino acid sequence as shown in Table 2. 

In another embodiment, the invention provides an antibody that specifically 
10 binds a polypeptide that has an amino acid sequence as shown in Table 2. The antibody can 
be conjugated to an effector component such as a fluorescent label, a toxin, or a radioisotope. 
In some embodiments, the antibody is an antibody fragment or a humanized antibody. 
* In another aspect, the invention provides a method of detecting a cell 

undergoing angiogenesis in a biological sample from a patient, the method comprising 
°1 5 contacting the biological sample with an antibody that specifically binds to .a polypeptide 
H» that has an amino acid sequence as shown in Table 2. In some embodiment, the antibody is 
further conjugated to an effector component, for example, a fluorescent label. 

In another embodiment, the invention provides a method of detecting 
antibodies specific to angiogenesis in a patient, the method comprising contacting a 
20 biological sample from the patient with a polypeptide comprising a sequence as shown in 
Table 2. 

The invention also provides a method of identifying a compound that 
modulates the activity of an angiogenesis-associated polypeptide, the method comprising the 
steps of: (i) contacting the compound with a polypeptide that comprises at least 80% identity 
25 to an amino acid sequence as shown in Table 2; and (ii) detecting an increase or a decrease in 
the activity of the polypeptide. In one embodiment, the polypeptide has an amino acid 
sequence as shown in Table 2. In another embodiment, the polypeptide is expressed in a cell. 

The invention also provides a method of identifying a compound that 
modulates angiogenesis, the method comprising steps of: (i) contacting the compound with a 
30 cell undergoing angiogenesis; and (ii) detecting an U rease or a decrease in the expression of 
a polypeptide sequence as shown in Table 2. In one embodiment, the detecting step 
comprises hybridizing a nucleic acid sample from the cell with a polynucleotide that 
selectively hybridizes to a sequence at least 80% identical to a sequence as shown in Table 1. 
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In another embodiment, the method further comprises detecting an increase or decrease in the 
expression of a second sequence as shown in Table 2. 

In another embodiment, the invention provides a method of inhibiting 
angiogenesis in a cell that expresses a polypeptide at least 80% identical to a sequence as 
shown in Table 2, the method comprising the step of contacting the cell with a therapeutically 
effective amount of an inhibitor of the polypeptide. In one embodiment, the polypeptide has 
an amino acid sequence shown in Table 2. In another embodiment, the inhibitor is an 
antibody. 

In other embodiments, the invention provides a method of activating 
AO angiogenesis in a cell that expresses a polypeptide at least 80% identical to a sequence as 

shown in Table 2, the method comprising the step of contacting the cell with a therapeutically 
effective amount of an activator of the polypeptide. In one embodiment, the polypeptide has 
an amino acid sequence shown in Table 2. 

Other aspects of the invention will become apparent to the skilled artisan by 
s 15 the following description of the invention. ' . 
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Table 1 provides nucleotide sequence of genes that exhibit changes in 
expression levels as a function of time in tissue undergoing angiogenesis compared to tissue 
that is not. 

20 Table 2 provides polypeptide sequence of proteins that exhibit changes in 

expression levels as a function of time in tissue undergoing angiogenesis compared to tissue 
that is not. 

DESCRIPTION OF THE SPECIFIC EMBODIMENTS 
25 In accordance with the objects outlined above, the present invention provides 

novel methods for diagnosis and treatment of disorders associated with angiogenesis 
(sometimes referred to herein as angiogenesis disorders or AD), as well as methods for 
screening for compositions which modulate angiogenesis. By "disorder associated with 
angiogenesis" or "disease associated with angiogenesis" herein is meant a disease state which 
30 is marked by either an excess or a deficit of vessel development. Angiogenesis * isorders 

asociated with increased angiogenesis include, but are not limited to, cancer and proliferative 
diabetic retinopathy. Pathological states for which it may be desirable to increase 
angiogenesis include stroke, heart disease, infertility, ulcers, and scleradoma. Also provided 
are methods for treating AD. 
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Definitions 

The term "angiogenesis protein" or "angiogenesis polynucleotide" refers to 
nucleic acid and polypeptide polymorphic variants, alleles, mutants, and interspecies 
homologs that: (1) have an amino acid sequence that has greater than about 60% amino acid 
5 sequence identity, 65%, 70%, 75%, 80%, 85%, 90%, preferably 91%, 92%, 93%, 94%, 95%, 
96%, 97%, 98% or 99% or greater amino acid sequence identity, preferably over a region of 
over a region of at least about 25, 50, 100, 200, 500, 1000, or more amino acids, to an 
angiogenesis protein sequence of Table 2; (2) bind to antibodies, e.g., polyclonal antibodies, 
raised against an immunogen comprising an amino acid sequence of Table 2, and 
10 conservatively modified variants thereof; (3) specifically hybridize under stringent 
O hybridization conditions to an anti-sense strand corresponding to a nucleic acid sequence of 
S Table 1 and conservatively modified variants thereof; (4) have a nucleic acid sequence that 
U has greater than about 95%, preferably greater than about 96%, 97%, 98%, 99%, or higher 
rri nucleotide sequence identity, preferably over a region of at least about 25, 50, 100, 200, 500, 
Cl 15 1000, or more nucleotides, to a sense sequence corresponding to one set out in Table L A 
H= polynucleotide or polypeptide sequence is typically from a mammal including, but not 

It's jj 

O limited to, primate, e.g., human; rodent, e.g., rat, mouse, hamster; cow, pig, horse, sheep, or 
2; any mammal. An "angiogenesis polypeptide" and an "angiogenesis polynucleotide," include 
H= both naturally occurring or recombinant. 
20 A "full length" angiogenesis protein or nucleic acid refers to an agiogenesis 

polypeptide or polynucleotide sequence, or a variant thereof, that contains all of the elements 
normally contained in one or more naturally occurring, wild type angiogenesis polynucleotide 
or polypeptide sequences. The "full length" may be prior to, or after, various stages of post- 
translation processing. 

25 "Biological sample" as used herein is a sample of biological tissue or fluid that 

contains nucleic acids or polypeptides, e.g., of an angiogenic protein. Such samples include, 
but are not limited to, tissue isolated from primates, e.g., humans, or rodents, e.g., mice, and 
rats. Biological samples may also include sections of tissues such as biopsy and autopsy 
samples, and frozen sections taken for histologic purposes. A biological sample is typically 

30 obtained fro*a a eukaryotic organism, most preferably a mammal such as a primate e.g., 

chimpanzee or human; cow; dog; cat; a rodent, e.g., guinea pig, rat, mouse; rabbit; or a bird; 
reptile; or fish. 

"Providing a biological sample" means to obtain a biological sample for use in 
methods described in this invention. Most often, this will be done by removing a sample of 
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cells from an animal, but can also be accomplished by using previously isolated cells (e.g., 
isolated by another person, at another time, and/or for another purpose), or by performing the 
methods of the invention in vivo. Archival tissues, having treatment or outcome histroy, will 
be particularly useful. 

5 The terms "identical" or percent "identity," in the context of two or more 

nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that 
are the same or have a specified percentage of amino acid residues or nucleotides that are the 
same (i.e., about 70% identity, preferably 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 
96%, 97%, 98%, 99%, or higher identity over a specified region (e.g., SEQ ED NOS:l-4), 
0 when compared and aligned for maximum correspondence over a comparison window or 
designated region) as measured using a BLAST or BLAST 2.0 sequence comparison 
algorithms with default parameters described below, or by manual alignment and visual 
inspection (see, e.g., NCBI web site http://www.ncbi.nlm.nih.gov/BLAST/ or the like). Such 
sequences are then said to be "substantially identical." This definition also refers to, or may 
15 be applied to, the compliment of a test sequence. The definition also includes sequences that 
have deletions and/or additions, as well as those that have substitutions. As described below, 
the preferred algorithms can account for gaps and the like. Preferably, identity exists over a 
region that is at least about 25 amino acids or nucleotides in length, or more preferably over a 
region that is 50-100 amino acids or nucleotides in length. 
20 For sequence comparison, typically one sequence acts as a reference sequence, 

to which test sequences are compared. When using a sequence comparison algorithm, test 
and reference sequences are entered into a computer, subsequence coordinates are designated, 
if necessary, and sequence algorithm program parameters are designated. Preferably, default 
program parameters can be used, or alternative parameters can be designated. The sequence 
25 comparison algorithm then calculates the percent sequence identities for the test sequences 
relative to the reference sequence, based on the program parameters. 

A "comparison window", as used herein, includes reference to a segment of 
any one of the number of contiguous positions selected from the group consisting of from 20 
to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a 
30 sequence may be compared to a refer* « ce sequence of the same number of contiguous 

positions after the two sequences are optimally aligned. Methods of alignment of sequences 
for comparison are well-known in the art. Optimal alignment of sequences for comparison 
can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. 
Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol. 
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Biol. 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat 'I. 
Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms 
(GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, 
Genetics Computer Group, 575 Science Dr., Madison, WI), or by manual alignment and 
5 visual inspection (see, e.g., Current Protocols in Molecular Biology (Ausubel et ah, eds. 

1995 supplement)). 

A preferred example of algorithm that is suitable for determining percent 
sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which 
are described in Altschul et al, Nuc. Acids Res. 25:3389-3402 (1977) and Altschul et al, J. 
10 Mol. Biol. 215:403-410 (1990), respectively. BLAST and BLAST 2.0 are used, with the 
5 parameters described herein, to determine percent sequence identity for the nucleic acids and 

5 proteins of the invention. Software for performing BLAST analyses is publicly available 

1 y 

H through the National Center for Biotechnology Information (ht1p://www;ncbi.nlm.nih.gov/). 
m This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying 
U 15 short words of length W in the query sequence, which either match or satisfy some positive- 
ly valued threshold score T when aligned with a word of the same length in a database 
q sequence. T is referred to as the neighborhood word score threshold (Altschul et al 9 supra). 

2 These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs 
U containing them. The word hits are extended in both directions along each sequence for as 

20 far as the cumulative alignment score can be increased. Cumulative scores are calculated 
using, for nucleotide sequences, the parameters M (reward score for a pair of matching 
residues; always > 0) and N (penalty score for mismatching residues; always < 0). For amino 
acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the 
word hits in each direction are halted when: the cumulative alignment score falls off by the 

25 quantity X from its maximum achieved value; the cumulative score goes to zero or below, 
due to the accumulation of one or more negative-scoring residue alignments; or the end of 
either sequence is reached. The BLAST algorithm parameters W, T, and X determine the 
sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) 
uses as defaults a wordlength (W) of 1 1, an expectation (E) of 10, M=5, N=-4 and a 

30 comparison of both strands. For amino acid sequences, the BL* .STP program uses as 

defaults a wordlength of 3, and expectation (E) of 10, and the BLOSUM62 scoring matrix 
(see Henikoff & Henikoff, Proc. Natl Acad. Sci. USA 89:10915 (1989)) alignments (B) of 
50, expectation (E) of 10, M=5, N=-4, and a comparison of both strands. 



7 



The BLAST algorithm also performs a statistical analysis of the similarity 
between two sequences {see, e.g., Karlin & Altschul, Proc. Nail Acad. Sci. USA 90:5873- 
5787 (1993)). One measure of similarity provided by the BLAST algorithm is the smallest 
sum probability (P(N)), which provides an indication of the probability by which a match 
5 between two nucleotide or amino acid sequences would occur by chance. For example, a 
nucleic acid is considered similar to a reference sequence if the smallest sum probability in a 
comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more 
preferably less than about 0.01, and most preferably less than about 0.001. 

An indication that two nucleic acid sequences or polypeptides are substantially 
MjLO identical is that the polypeptide encoded by the first nucleic acid is immunologically cross 
O reactive with the antibodies raised against the polypeptide encoded by the second nucleic 
LI acid, as described below. Thus, a polypeptide is typically substantially identical to a second 
polypeptide, for example, where the two peptides differ only by conservative substitutions. 

CP 

O Another indication that two nucleic acid sequences are substantially identical is that the two 
Ll5 molecules or their complements hybridize to each other under stringent 'conditions, as 
111 described below. Yet another indication that two nucleic acid sequences are substantially 
yl identical is that the same primers can be used to amplify the sequences, 
h; A "host cell" is a naturally occurring cell or a transformed cell that contains an 

expression vector and supports the replication or expression of the expression vector. Host 
20 cells may be cultured cells, explants, cells in vivo, and the like. Host cells may be 

prokaryotic cells such as E. coli, or eukaryotic cells such as yeast, insect, amphibian, or 
mammalian cells such as CHO, HeLa, and the like (see, e.g., the American Type Culture 
Collection catalog or web site, Avww.atcc.org). 

The terms "polypeptide," "peptide" and "protein" are used interchangeably 
25 herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers 
in which one or more amino acid residue is an artificial chemical mimetic of a corresponding 
naturally occurring amino acid, as well as to naturally occurring amino acid polymers and 
non-naturally occurring amino acid polymer. 

The term "amino acid" refers to naturally occurring and synthetic amino acids, 
30 * as well as amino acid analogs and amino acid mimetics that function in a manner similar* j 
the naturally occurring amino acids. Naturally occurring amino acids are those encoded by 
the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, y 
carboxyglutamate, and O-phosphoserine. Amino acid analogs refers to compounds that have 
the same basic chemical structure as a naturally occurring amino acid, i.e., an a carbon that is 
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bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, 
norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified 
R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical 
structure as a naturally occurring amino acid. Amino acid mimetics refers to chemical 
5 compounds that have a structure that is different from the general chemical structure of an 
amino acid, but that functions in a manner similar to a naturally occurring amino acid. 

Amino acids may be referred to herein by either their commonly known three 
letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical 
Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly 
Jf 1 0 accepted single-letter codes. 

Q "Conservatively modified variants" applies to both amino acid and nucleic 

Li acid sequences. With respect to particular nucleic acid sequences, conservatively modified 
5 variants refers to those nucleic acids which encode identical or essentially identical amino 

:: i: 5 
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O acid sequences, or where the nucleic acid does not encode an ammo acid sequence, to 
L 1 5 essentially identical sequences. Because of the degeneracy of the genetic code, a large 
Jif number of functionally identical nucleic acids encode any given protein. For instance, the 

ii rl 

m codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every 
Jl position where an alanine is specified by a codon, the codon can be altered to any of the 

corresponding codons described without altering the encoded polypeptide. Such nucleic acid 
20 variations are "silent variations," which are one species of conservatively modified 

variations. Every nucleic acid sequence herein which encodes a polypeptide also describes 
every possible silent variation of the nucleic acid. One of skill will recognize that each codon 
in a nucleic acid (except AUG, which is ordinarily the only codon for methionine, and TGG, 
which is ordinarily the only codon for tryptophan) can be modified to yield a functionally 
25 identical molecule. Accordingly, each silent variation of a nucleic acid which encodes a 

polypeptide is implicit in each described sequence with respect to the expression product, but 
not with respect to actual probe sequences. 

As to amino acid sequences, one of skill will recognize that individual 
substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein 
30 sequence which alU-£s, adds or deletes a single amino acid or a small percentage of amino 
acids in the encoded sequence is a "conservatively modified variant" where the alteration 
results in the substitution of an amino acid with a chemically similar amino acid. 
Conservative substitution tables providing functionally similar amino acids are well known in 
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the art. Such conservatively modified variants are in addition to and do not exclude 
polymorphic variants, interspecies homologs, and alleles of the invention. 

The following eight groups each contain amino acids that are conservative 
substitutions for one another: 1) Alanine (A), Glycine (G); 2) Aspartic acid (D), Glutamic acid 
5 (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), 
Leucine (L), Methionine (M), Valine (V); 6) Phenylalanine (F), Tyrosine (Y), Tryptophan 
(W); 7) Serine (S), Threonine (T); and 8) Cysteine (C), Methionine (M) (see, e.g., Creighton, 
Proteins (1984)). 

Macromolecular structures such as polypeptide structures can be described in 
1 0 terms of various levels of organization. For a general discussion of this organization, see, 
Q e.g., Alberts et al, Molecular Biology of the Cell (3 rd ed., 1994) and Cantor and Schimmel, 
jri Biophysical Chemistry Part I: The Conformation of Biological Macromolecules (1980). 
t: "Primary structure" refers to the amino acid sequence of a particular peptide. "Secondary 
01 structure" refers to locally ordered, three dimensional structures within a polypeptide. These 
rts structures are commonly known as domains. Domains are portions of a'polypeptide that 

jjf; form a compact unit of the polypeptide and are typically 25 to approximately 500 amino 

in ? 

O acids long. Typical domains are made up of sections of lesser organization such as stretches 

% of p-sheet and a-helices. "Tertiary structure" refers to the complete three dimensional 

'jess? 

%fA structure of a polypeptide monomer. "Quaternary structure" refers to the three dimensional 
20 structure formed, usually by the noncovalent association of independent tertiary units. 
Anisotropic terms are also known as energy terms. 

A "label" or a "detectable moiety" is a composition detectable by 
spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical 
means. For example, useful labels include 32 P, fluorescent dyes, electron-dense reagents, 
25 enzymes (e.g. 9 as commonly used in an ELISA), biotin, digoxigenin, or haptens and proteins 
which can be made detectable, e.g., by incorporating a radiolabel into the peptide or used to 
detect antibodies specifically reactive with the peptide. 

An "effector" or "effector moiety" or "effector component" is a molecule that 
is bound (or linked, or conjugated), either covalently, through a linker or a chemical bond, or 
30 noncovalently, through ionic, van der Waals, electrostatic, or hydrogen bonds, to an antibody. 
The "effector" can be a variety of molecules including, for example, detection moieties 
including radioactive compounds, fluroescent compounds, an enzyme or substrate, tags such 
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as epitope tags, a toxin; a chemotherapeutic agent; a lipase; an antibiotic; or a radioisotope 

emitting "hard" e.g., beta radiation. 

A "labeled nucleic acid probe or oligonucleotide" is one that is bound, either 
covalently, through a linker or a chemical bond, or noncovalently, through ionic, van der 
Waals, electrostatic, or hydrogen bonds to a label such that the presence of the probe may be 
detected by detecting the presence of the label bound to the probe. Alternatively, method 
using high affinity interactions may achieve the same results where one of a pair of binding 
partners binds to the other, e.g., biotin, streptavidin. 

As used herein a "nucleic acid probe or oligonucleotide" is defined as a 
nucleic acid capable of binding to a target nucleic acid of complementary sequence through 
one or more types of chemical bonds, usually through complementary base pairing, usually 
through hydrogen bond formation. As used herein, a probe may include natural {i.e., A, G, C, 
or T) or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in a probe 
may be joined by a linkage other than a phosphodiester bond, so long as it does not interfere 
with hybridization. Thus, for example, probes may be peptide nucleic acids in which the 
constituent bases are joined by peptide bonds rather than phosphodiester linkages. It will be 
understood by one of skill in the art that probes may bind target sequences lacking complete 
complementarity with the probe sequence depending upon the stringency of the hybridization 
conditions. The probes are preferably directly labeled as with isotopes, chromophores, 
lumiphores, chromogens, or indirectly labeled such as with biotin to which a streptavidin 
complex may later bind. By assaying for the presence or absence of the probe, one can detect 
the presence or absence of the select sequence or subsequence. 

The term "recombinant" when used with reference, e.g., to a cell, or nucleic 
acid, protein, or vector, indicates that the cell, nucleic acid, protein or vector, has been 
modified by the introduction of a heterologous nucleic acid or protein or the alteration of a 
native nucleic acid or protein, or that the cell is derived from a cell so modified. Thus, for 
example, recombinant cells express genes that are not found within the native (non- 
recombinant) form of the cell or express native genes that are otherwise abnormally 
expressed, under expressed or not expressed at all. 

The term "heterologous" when used with reference t©^: ions of a nucleic 
acid indicates that the nucleic acid comprises two or more subsequences that are not found in 
the same relationship to each other in nature. For instance, the nucleic acid is typically 
recombinantly produced, having two or more sequences from unrelated genes arranged to 
make a new functional nucleic acid, e.g., a promoter from one source and a coding region 
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from another source. Similarly, a heterologous protein indicates that the protein comprises 
two or more subsequences that are not found in the same relationship to each other in nature 
(e.g., a fusion protein). 

A "promoter" is defined as an array of nucleic acid control sequences that 
direct transcription of a nucleic acid. As used herein, a promoter includes necessary nucleic 
acid sequences near the start site of transcription, such as, in the case of a polymerase II type 
promoter, a TATA element. A promoter also optionally includes distal enhancer or repressor 
elements, which can be located as much as several thousand base pairs from the start site of 
transcription. A "constitutive" promoter is a promoter that is active under most 
environmental and developmental conditions. An "inducible" promoter is a promoter that is 
active under environmental or developmental regulation. The term "operably linked" refers 
to a functional linkage between a nucleic acid expression control sequence (such as a 
promoter, or array of transcription factor binding sites) and a second nucleic acid sequence, 
wherein the expression control sequence directs transcription of the nucleic acid 
corresponding to the second sequence. 

An "expression vector" is a nucleic acid construct, generated recombinantly or 
synthetically, with a series of specified nucleic acid elements that permit transcription of a 
particular nucleic acid in a host cell The expression vector can be part of a plasmid, virus, or 
nucleic acid fragment. Typically, the expression vector includes a nucleic acid to be 
transcribed operably linked to a promoter. 

The phrase "selectively (or specifically) hybridizes to" refers to the binding, 
duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under 
stringent hybridization conditions when that sequence is present in a complex mixture (e.g., 

total cellular or library DNA or RNA). 

The phrase "stringent hybridization conditions" refers to conditions under 
which a probe will hybridize to its target subsequence, typically in a complex mixture of 
nucleic acids, but to no other sequences. Stringent conditions are sequence-dependent and 
will be different in different circumstances. Longer sequences hybridize specifically at 
higher temperatures. An extensive guide to the hybridization of nucleic acids is found in 
Tijsr.a, Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic 
Probes, "Overview of principles of hybridization and the strategy of nucleic acid assays" 
(1993). Generally, stringent conditions are selected to be about 5-10°C lower than the 
thermal melting point (T m ) for the specific sequence at a defined ionic strength pH. The T m 1 
the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% 
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of the probes complementary to the target hybridize to the target sequence at equilibrium (as 
the target sequences are present in excess, at T m , 50% of the probes are occupied at 
equilibrium). Stringent conditions will be those in which the salt concentration is less than 
about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other 
salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., 10 to 
50 nucleotides) and at least about 60°C for long probes (e.g., greater than 50 nucleotides). 
Stringent conditions may also be achieved with the addition of destabilizing agents such as 
formamide. For selective or specific hybridization, a positive signal is at least two times 
background, preferably 10 times background hybridization. Exemplary stringent 
hybridization conditions can be as following: 50% formamide, 5x SSC, and 1% SDS, 
incubating at 42°C, or, 5x SSC, 1% SDS, incubating at 65°C, with wash in 0.2x SSC, and 
0.1% SDS at 65°C. For PCR, a temperature of about 36°C is typical for low stringency 
amplification, although annealing temperatures may vary between about 32°C and 48°C 
depending on primer length. For high stringency PCR amplification, a temperature of about 
62°C is typical, although high stringency annealing temperatures can range from about 50°C 
to about 65°C, depending on the primer length and specificity. Typical cycle conditions for 
both high and low stringency amplifications include a denaturation phase of 90°C - 95°C for 
30 sec - 2 min., an annealing phase lasting 30 sec. - 2 min., and an extension phase of about 
72°C for 1 - 2 min. Protocols and guidelines for low and high stringency amplification 
reactions are provided, e.g., in Innis et al (1990) PCR Protocols, A Guide to Methods and 
Applications, Academic Press, Inc. N.Y.). 

Nucleic acids that do not hybridize to each other under stringent conditions are 
still substantially identical if the polypeptides which they encode are substantially identical. 
This occurs, for example, when a copy of a nucleic acid is created using the maximum codon 
degeneracy permitted by the genetic code. In such cases, the nucleic acids typically hybridize 
under moderately stringent hybridization conditions. Exemplary "moderately stringent 
hybridization conditions" include a hybridization in a buffer of 40% formamide, 1 M NaCl, 
1% SDS at 37°C, and a wash in IX SSC at 45°C. A positive hybridization is at least twice 
background. Those of ordinary skill will readily recognize that alternative hybridization and 
wash conditions can be utilized to provide conditions of similar stringency. Additional 
guidelines for determining hybridization parameters are provided in numerous reference, e.g., 
and Current Protocols in Molecular Biology, ed. Ausubel, et al 
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The phrase "functional effects" in the context of assays for testing compounds 
that modulate activity of an angiogenesis protein includes the determination of a parameter 
that is indirectly or directly under the influence of the angiogenesis protein, e.g., a functional, 
physical, or chemical effect, such as the ability to increase or decrease angiogenesis. It 
5 includes binding activity, the ability of cells to proliferate, expression in cells undergoing 
angiogenesis, and other characteristics of angiogenic cells. "Functional effects" include in 
vitro, in vivo, and ex vivo activities. 

By "determining the functional effect" is meant assaying for a compound that 
increases or decreases a parameter that is indirectly or directly under the influence of an 
WO angiogenesis protein sequence, e.g., functional, physical and chemical effects. Such 

JttBft 

s i 

S functional effects can be measured by any means known to those skilled in the art, e.g., 

changes in spectroscopic characteristics (e.g., fluorescence, absorbance, refractive index), 

s — 

Bi hydrodynamic (e.g., shape), chromatographic, or solubility properties for the protein, 
r J measuring inducible markers or transcriptional activation of the angiogenesis protein; 
: 1 5 measuring binding activity or binding assays, e.g. binding to antibodies,' and measuring 
fll cellular proliferation, particularly endothelial cell proliferation. Determination of the 
S{ functional effect of a compound on angiogenesis can also be performed using angiogenesis 
O assays known to those of skill in the art such as an in vitro assays, e.g., in vitro endothelial 

cell tube formation assays, and other assays such as the chick CAM assay, the mouse corneal 
20 assay, and assays that assess vascularization of an implanted tumor. The functional effects 
can be evaluated by many means known to those skilled in the art, e.g., microscopy for 
quantitative or qualitative measures of alterations in morphological features, e.g., tube or 
blood vessel formation, measurement of changes in RNA or protein levels for angiogenesis- 
associated sequences, measurement of RNA stability, identification of downstream or 
25 reporter gene expression (CAT, luciferase, P-gal, GFP and the like), e.g., via 

chemiluminescence, fluorescence, colorimetric reactions, antibody binding, inducible 
markers, and ligand binding assays. 

"Inhibitors", "activators", and "modulators" of angiogenic polynucleotide and 
polypeptide sequences are used to refer to activating, inhibitory, or modulating molecules 
30 identified using in vitro and in vivo assays of angiogeni c polynucleotide and polypeptide 
sequences. Inhibitors are compounds that, e.g., bind to, partially or totally block activity, 
decrease, prevent, delay activation, inactivate, desensitize, or down regulate the activity or 
expression of angiogenesis proteins, e.g., antagonists. "Activators" are compounds that 
increase, open, activate, facilitate, enhance activation, sensitize, agonize, or up regulate 
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angiogenesis protein activity. Inhibitors, activators, or modulators also include genetically 
modified versions of angiogenesis proteins, e.g., versions with altered activity, as well as 
naturally occurring and synthetic ligands, antagonists, agonists, antibodies, small chemical 
molecules and the like. Such assays for inhibitors and activators include, e.g., expressing the 
angiogenic protein in vitro, in cells, or cell membranes, applying putative modulator 
compounds, and then determining the functional effects on activity, as described above. 
Activators and inhibitors of angiogenesis can also be identified by incubating angiogenic 
cells with the test compound and determining increases or decreases in the expression of 1 or 
more angiogenesis proteins, e.g., 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50 or more angiogenesis 
proteins, such as angiogenesis proteins comprising the sequences set out in Table 2. 

Samples or assays comprising angiogenesis proteins that are treated with a 
potential activator, inhibitor, or modulator are compared to control samples without the 
inhibitor, activator, or modulator to examine the extent of inhibition. Control samples 
(untreated with inhibitors) are assigned a relative protein activity value of 100%. Inhibition 
of a polypeptide is achieved when the activity value relative to the control is about 80%, 
preferably 50%, more preferably 25-0%. Activation of an angiogenesis polypeptide is 
achieved when the activity value relative to the control (untreated with activators) is 1 10%, 
more preferably 150%, more preferably 200-500% (i.e., two to five fold higher relative to the 
control), more preferably 1000-3000% higher. 

"Antibody" refers to a polypeptide comprising a framework region from an 
immunoglobulin gene or fragments thereof that specifically binds and recognizes an antigen. 
The recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, 
epsilon, and mu constant region genes, as well as the myriad immunoglobulin variable region 
genes. Light chains are classified as either kappa or lambda. Heavy chains are classified as 
gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, 
IgM, IgA, IgD and IgE, respectively. Typically, the antigen-binding region of an antibody 
will be most critical in specificity and affinity of binding. 

An exemplary immunoglobulin (antibody) structural unit comprises a 
tetramer. Each tetramer is composed of two identical pairs of polypeptide chains, each pair 
having one "light" (about 25 kD) and one "heavy" chain (about 50-70 kD). The ^T-terminus 
of each chain defines a variable region of about 100 to 1 10 or more amino acids primarily 
responsible for antigen recognition. The terms variable light chain (V L ) and variable heavy 
chain (V H ) refer to these light and heavy chains respectively. 
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Antibodies exist, e.g., as intact immunoglobulins or as a number of well- 
characterized fragments produced by digestion with various peptidases. Thus, for example, 
pepsin digests an antibody below the disulfide linkages in the hinge region to produce F(ab)' 2 , 
a dimer of Fab which itself is a light chain joined to V H -C H 1 by a disulfide bond. The F(ab)' 2 
5 may be reduced under mild conditions to break the disulfide linkage in the hinge region, 
thereby converting the F(ab)' 2 dimer into an Fab* monomer. The Fab' monomer is 
essentially Fab with part of the hinge region (see Fundamental Immunology (Paul ed., 3d ed. 
1993). While various antibody fragments are defined in terms of the digestion of an intact 
antibody, one of skill will appreciate that such fragments may be synthesized de novo either 
|4 0 chemically or by using recombinant DNA methodology. Thus, the term antibody, as used 
t[ herein, also includes antibody fragments either produced by the modification of whole 
W antibodies, or those synthesized de novo using recombinant DNA methodologies (e.g., single 
bl chain Fv) or those identified using phage display libraries (see, e.g., McCafferty et al, Nature 
% 348:552-554 (1990)) 

* 1 5 For preparation of antibodies, e.g. , recombinant, monoclonal, or polyclonal 

ft! antibodies, many technique known in the art can be used (see, e.g. , Kohler & Milstein, 
m Nature 256:495-497 (1975); Kozbor et al, Immunology Today 4: 72 (1983); Cole et al, pp. 
O 77-96 in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc. (1985); Coligan, 

Current Protocols in Immunology (1991); Harlow & Lane, Antibodies, A Laboratory Manual 
20 (1988); and Goding, Monoclonal Antibodies: Principles and Practice (2d ed. 1986)). 
Techniques for the production of single chain antibodies (U.S. Patent 4,946,778) can be 
adapted to produce antibodies to polypeptides of this invention. Also, transgenic mice, or 
other organisms such as other mammals, may be used to express humanized antibodies. 
Alternatively, phage display technology can be used to identify antibodies and heteromeric 
25 Fab fragments that specifically bind to selected antigens (see, e.g., McCafferty et al, Nature 
348:552-554 (1990); Marks et al, Biotechnology 10:779-783 (1992)). 

A "chimeric antibody" is an antibody molecule in which (a) the constant 
region, or a portion thereof, is altered, replaced or exchanged so that the antigen binding site 
(variable region) is linked to a constant region of a different or altered class, effector function 
30 and/or specie;-- or an entirely different molecule which confers new properties to the chimeric 
antibody, e.g., an enzyme, toxin, hormone, growth factor, drug, etc.; or (b) the variable 
region, or a portion thereof, is altered, replaced or exchanged with a variable region having a 
different or altered antigen specificity. 
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The present application may be related to USSN 09/437,702, filed Nov. 10, 
1999; USSN 09/437,528, filed Nov. 10, 1999; USSN 09/434,197, filed Nov. 4, 1999; USSN 
60/183,926, filed Feb. 22, 2000; USSN 09/440,493, filed Nov. 15, 1999; USSN 09/520,478, 
filed Mar. 8, 2000; USSN 09/440,369, filed Nov. 12, 1999; Attorney Docket number 
A68928, filed Dec. 15, 2000; Attorney Docket number A69789, filed Jan. 22, 2001; and 
Attorney Docket number A69806, filed Dec. 15, 2000. 

The detailed description of the invention includes discussion of the following 
aspects of the invention: Expression of angiogenesis-associated sequences 

Informatics 

Angiogenesis-associated sequences 

Detection of angiogenesis sequence for diagnostic and 
therapeutic applications 

- Modulators of angiogenesis 

Methods of identifying variant angiogenesis-associated 
sequences 

Administration of pharmaceutical and vaccinecompositions 
Kits for use in diagnostic and/or prognostic applications. 

Expression of angiogenesis-associated sequences 

In one aspect, the expression levels of genes are determined in different 
patient samples for which diagnosis information is desired, to provide expression profiles. 
An expression profile of a particular sample is essentially a "fingerprint" of the state of the 
sample; while two states may have any particular gene similarly expressed, the evaluation of 
a number of genes simultaneously allows the generation of a gene expression profile that is 
unique to the state of the cell. That is, normal tissue may be distinguished from AD tissue. 
By comparing expression profiles of tissue in known different angiogenesis states, 
information regarding which genes are important (including both up- and down-regulation of 
genes) in each of these states is obtained. The identification of sequences that are 
differentially expressed in angiogenic versus non-angiogenic tissue allows the use of this 
information in a number of ways. For example, a particular treatment regime may be 
evaluated: does a chemotherapeutic dni£ act to down-regulate angiogenesis, and thus tumor 
growth or recurrence, in a particular patient. Similarly, diagnosis and treatment outcomes 
may be done or confirmed by comparing patient samples with the known expression profiles. 
Angiogenic tissue can also be analyzed to determine the stage of angiogenesis in the tissue. 
Furthermore, these gene expression profiles (or individual genes) allow screening of drug 
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candidates with an eye to mimicking or altering a particular expression profile; for example, 
screening can be done for drugs that suppress the angiogenic expression profile. This may be 
done by making biochips comprising sets of the important angiogenesis genes, which can 
then be used in these screens. These methods can also be done on the protein basis; that is, 
5 protein expression levels of the angiogenic proteins can be evaluated for diagnostic purposes 
or to screen candidate agents. In addition, the angiogenic nucleic acid sequences can be 
administered for gene therapy purposes, including the administration of antisense nucleic 
acids, or the angiogenic proteins (including antibodies and other modulators thereof) 
administered as therapeutic drugs. 
10 Thus the present invention provides nucleic acid and protein sequences that 

I are differentially expressed in angiogenesis, herein termed "angiogenesis sequences". As 

outlined below, angiogenesis sequences include those that are up-regulated (i.e. expressed at 
a higher level) in disorders associated with angiogenesis, as well as those that are down- 
regulated (i.e. expressed at a lower level). In a preferred embodiment, the angiogenesis 
15 sequences are from humans; however, as will be appreciated by those in the art, angiogenesis 
sequences from other organisms may be useful in animal models of disease and drug 
evaluation; thus, other angiogenesis sequences are provided, from vertebrates, including 
mammals, including rodents (rats, mice, hamsters, guinea pigs, etc.), primates, farm animals 

3? 

h (including sheep, goats, pigs, cows, horses, etc). Angiogenesis sequences from other 

20 organisms may be obtained using the techniques outlined below. 

Angiogenesis sequences can include both nucleic acid and amino acid 
sequences. In a preferred embodiment, the angiogenesis sequences are recombinant nucleic 
acids. By the term "recombinant nucleic acid" herein is meant nucleic acid, originally formed 
in vitro, in general, by the manipulation of nucleic acid e.g., using polymerases and 

25 endonucleases, in a form not normally found in nature. Thus an isolated nucleic acid, in a 
linear form, or an expression vector formed in vitro by ligating DNA molecules that are not 
normally joined, are both considered recombinant for the purposes of this invention. It is 
understood that once a recombinant nucleic acid is made and reintroduced into a host cell or 
organism, it will replicate non-recombinantly, i.e. using the in vivo cellular machinery of the 

30 host cell rather than in vitro manipulations; however, such nucle ? acids, once produced 
recombinant^, although subsequently replicated non-recombinantly, are still considered 
recombinant for the purposes of the invention. 

Similarly, a "recombinant protein" is a protein made using recombinant 
techniques, i.e. through the expression of a recombinant nucleic acid as depicted above. A 
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recombinant protein is distinguished from naturally occurring protein by at least one or more 
characteristics. For example, the protein may be isolated or purified away from some or all 
of the proteins and compounds with which it is normally associated in its wild type host, and 
thus may be substantially pure. For example, an isolated protein is unaccompanied by at least 
5 some of the material with which it is normally associated in its natural state, preferably 
constituting at least about 0.5%, more preferably at least about 5% by weight of the total 
protein in a given sample. A substantially pure protein comprises at least about 75% by 
weight of the total protein, with at least about 80% being preferred, and at least about 90% 
being particularly preferred. The definition includes the production of an angiogenesis protein 
uJlO from one organism in a different organism or host cell. Alternatively, the protein may be 
2 made at a significantly higher concentration than is normally seen, through the use of an 
fii inducible promoter or high expression promoter, such that the protein is made at increased 
m concentration levels. Alternatively, the protein may be in a form not normally found in 
2J nature, as in the addition of an epitope tag or amino acid substitutions, insertions and 
s 15 deletions, as discussed below. ' • 

JJ j In a preferred embodiment, the angiogenesis sequences are nucleic acids. As 

2 will be appreciated by those in the art and is more fully outlined below, angiogenesis 
Q sequences are useful in a variety of applications, including diagnostic applications, which will 
detect naturally occurring nucleic acids, as well as screening applications; for example, 
20 biochips comprising nucleic acid probes to the angiogenesis sequences can be generated. In 
the broadest sense, then, by "nucleic acid" or "oligonucleotide" or grammatical equivalents 
herein means at least two nucleotides covalently linked together. A nucleic acid of the 
present invention will generally contain phosphodiester bonds, although in some cases, 
nucleic acid analogs are included that may have alternate backbones, comprising, for 
25 example, phosphoramidate, phosphorothioate, phosphorodithioate, or O- 

methylphophoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A Practical 
Approach, Oxford University Press); and peptide nucleic acid backbones and linkages. Other 
analog nucleic acids include those with positive backbones; non-ionic backbones, and non- 
ribose backbones, including those described in U.S. Patent Nos. 5,235,033 and 5,034,506, 
30 * and Chapters 6 and 7, ASC Symposium Series 580, "Carbohydrate Modifications in * 
Antisense Research", Ed. Y.S. Sanghui and P. Dan Cook. Nucleic acids containing one or 
more carbocyclic sugars are also included within one definition of nucleic acids. 
Modifications of the ribose-phosphate backbone may be done for a variety of reasons, for 



19 



example to increase the stability and half-life of such molecules in physiological 
environments or as probes on a biochip. 

As will be appreciated by those in the art, nucleic acid analogs may find use in 
the present invention. In addition, mixtures of naturally occurring nucleic acids and analogs 
5 can be made; alternatively, mixtures of different nucleic acid analogs, and mixtures of 
naturally occurring nucleic acids and analogs may be made. 

Particularly preferred are peptide nucleic acids (PNA) which includes peptide 
nucleic acid analogs. These backbones are substantially non-ionic under neutral conditions, in 
contrast to the highly charged phosphodiester backbone of naturally occurring nucleic acids. 
10 This results in two advantages. First, the PNA backbone exhibits improved hybridization 
P kinetics. PNAs have larger changes in the melting temperature (Tm) for mismatched versus 
S perfectly matched basepairs. DNA and RNA typically exhibit a 2-4°C drop in T m for an 
M internal mismatch. With the non-ionic PNA backbone, the drop is closer to 7-9°C. Similarly, 
m due to their non-ionic nature, hybridization of the bases attached to these backbones is 
lJ l5 relatively insensitive to salt concentration. In addition, PNAs are not degraded by cellular 
H; enzymes, and thus can be more stable. 

The nucleic acids may be single stranded or double stranded, as specified, or 
S contain portions of both double stranded or single stranded sequence. As will be appreciated 
H by those in the art, the depiction of a single strand also defines the sequence of the 
20 complementary strand; thus the sequences described herein also provide the complement of 
the sequence. The nucleic acid may be DNA, both genomic and cDNA, RNA or a hybrid, 
where the nucleic acid may contain combinations of deoxyribo- and ribo-nucleotides, and 
combinations of bases, including uracil, adenine, thymine, cytosine, guanine, inosine, 
xanthine hypoxanthine, isqcytosine, isoguanine, etc. As used herein, the term "nucleoside" 
25 includes nucleotides and nucleoside and nucleotide analogs, and modified nucleosides such 
as amino modified nucleosides. In addition, "nucleoside" includes non-naturally occurring 
analog structures. Thus for example the individual units of a peptide nucleic acid, each 
containing a base, are referred to herein as a nucleoside. 

An angiogenesis sequence can be initially identified by substantial nucleic 
30 acid and/or amino acid sequence homology to the angiogenesis sequences outlined h * :ein. 
Such homology can be based upon the overall nucleic acid or amino acid sequence, and is 
generally determined as outlined below, using either homology programs or hybridization 
conditions. 
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For identifying angiogenesis-associated sequences, the angiogenesis screen 
typically includes comparing genes identified in a modification of an in vitro model of 
angiogenesis as described in Hiraoka, Cell 95:365 (1998) with genes identified in controls. 
Samples of normal tissue and tissue undergoing angiogenesis are applied to biochips 
comprising nucleic acid probes. The samples are first microdissected, if applicable, and 
treated as is known in the art for the preparation of mRNA. Suitable biochips are 
commercially available, for example from Affymetrix. Gene expression profiles as described 
herein are generated and the data analyzed. 

In a preferred embodiment, the genes showing changes in expression as 
between normal and disease states are compared to genes expressed in other normal tissues, 
including, but not limited to lung, heart, brain, liver, breast, kidney, muscle, prostate, small 
intestine, large intestine, spleen, bone and placenta. In a preferred embodiment, those genes 
identified during the angiogenesis screen that are expressed in any significant amount in other 
tissues are removed from the profile, although in some embodiments, this is not necessary. 
That is, when screening for drugs, it is usually preferable that the target be .disease specific, to 

minimize possible side effects. 

In a preferred embodiment, angiogenesis sequences are those that are up- 
regulated in angiogenesis disorders; that is, the expression of these genes is higher in the 
disease tissue as compared to normal tissue. "Up-regulation" as used herein means at least 
about a two-fold change, preferably at least about a three fold change, with at least about 
five-fold or higher being preferred. All accession numbers herein are for the GenBank 
sequence database and the sequences of the accession numbers are hereby expressly 
incorporated by reference. GenBank is known in the art, see, e.g., Benson, DA, et al., 
Nucleic Acids Research 26:1-7 (1998) and http://www.ncbi.nlm.nih.gov/. Sequences are also 
avialable in other databases, e.g., European Molecular Biology Laboratory (EMBL) and 
DNA Database of Japan (DDBJ). In addition, most preferred genes were found to be 
expressed in a limited amount or not at all in heart, brain, lung, liver, breast, kidney, prostate, 
small intestine and spleen. 

In another preferred embodiment, angiogenesis sequences are those that are 
down-regulated ii miie angiogenesis disorder; that is, the expression of these genes is lower in 
angiogenic tissue as compared to normal tissue. "Down-regulation" as used herein means at 
least about a two-fold change, preferably at least about a three fold change, with at least about 
five-fold or higher being preferred. 
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Angiogenesis sequences according to the invention may be classified into 
discrete clusters of sequences based on common expression profiles of the sequences. 
Expression levels of angiogenesis sequences may increase or decrease as a function of time in 
a manner that correlates with the induction of angiogenesis. Alternatively, expression levels 
of angiogenesis sequences may both increase and decrease as a function of time. For 
example, expression levels of some angiogenesis sequences are temporarily induced or 
diminished during the switch to the angiogenesis phenotype, followed by a return to baseline 
expression levels. Table 1 provides genes, the mRNA expression of which varies as a 
function of time in angiogenesis tissue when compared to normal tissue. 

Table 2 provides protein sequences corresponding to the coding regions of the 
sequences that undergo changes in expression as a function of time in tissue undergoing 
angiogenesis. 

In a particularly preferred embodiment, angiogenesis sequences are those that 
are induced for a period of time, typically by positive angiogenic factors, followed by a return 
to the baseline levels. Sequences that are temporarily induced provide a means to target 
angiogenesis tissue, for example neovascularized tumors, at a particular stage of 
angiogenesis, while avoiding rapidly growing tissue that require perpetual vascularization. 
Such positive angiogenic factors include ocFGF, pFGF, VEGF, angiogenin and the like. 

Induced angiogenesis sequences also are further categorized with respect to 
the timing of induction. For example, some angiogenesis genes may be induced at an early 
time period, such as withinlO minutes of the induction of angiogenesis. Others may be 
induced later, such as between 5 and 60 minutes, while yet others may be induced for a time 
period of about two hours or more followed by a return to baseline expression levels. 

In another preferred embodiment are angiogenesis sequences that are inhibited 
or reduced as a function of time followed by a return to "normal" expression levels. 
Inhibitors of angiogenesis are examples of molecules that have this expression profile. These 
sequences also can be further divided into groups depending on the timing of diminished 
expression. For example, some molecules may display reduced expression within 10 minutes 
of the induction of angiogenesis. Others may be diminished later, such as between 5 and 60 
minutes, while others may be diminished P ■■: a time period of about two hours or more 
followed by a return to baseline. Examples of such negative angiogenic factors include 
thrombospondin and endostatin to name a few. 
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In yet another preferred embodiment are angiogenesis sequences that are 
induced for prolonged periods. These sequences are typically associated with induction of 
angiogenesis and may participate in induction and/or maintenance of the angiogenesis 
phenotype. 

In another preferred embodiment are angiogenesis sequences, the expression 
of which is reduced or diminished for prolonged periods in angiogenic tissue. These 
sequences are typically angiogenesis inhibitors and their diminution is correlated with an 
increase in angiogenesis. 

Informatics 

The ability to identify genes that undergo changes in expression with time 
during angiogenesis can additionally provide high-resolution, high-sensitivity datasets which 
can be used in the areas of diagnostics, therapeutics, drug development, biosensor 
development, and other related areas. For example, the expression profiles can be used in 
diagnostic or prognostic evaluation of patients with angiogenesis-associ&ted disease. Or as 
another example, subcellular toxicological information can be generated to better direct drug 
structure and activity correlation (see, Anderson, L., "Pharmaceutical Proteomics: Targets, 
Mechanism, and Function," paper presented at the EBC Proteomics conference, Coronado, 
CA (June 1 1-12, 1998)). Subcellular toxicological information can also be utilized in a 
biological sensor device to predict the likely toxicological effect of chemical exposures and 
likely tolerable exposure thresholds (see, U.S. Patent No. 5,81 1,231). Similar advantages 
accrue from datasets relevant to other biomolecules and bioactive agents (e.g., nucleic acids, 
saccharides, lipids, drugs, and the like). 

Thus, in anqther embodiment, the present invention provides a database that 
includes at least one set of data assay data. The data contained in the database is acquired , 
e.g., using array analysis either singly or in a library format. The database can be in 
substantially any form in which data can be maintained and transmitted, but is preferably an 
electronic database. The electronic database of the invention can be maintained on any 
electronic device allowing for the storage of and access to the database, such as a personal 
computer, but is preferably distributed on a wide area network, sucMis the World Wide Web. 

The focus of the present section on databases that include peptide sequence 
data is for clarity of illustration only. It will be apparent to those of skill in the art that similar 
databases can be assembled for any assay data acquired using an assay of the invention. 
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The compositions and methods for identifying and/or quantitating the relative 
and/or absolute abundance of a variety of molecular and macromolecular species from a 
biological sample undergoing angiogenesis, i.e., the identification of angiogenesis-associated 
sequences described herein, provide an abundance of information, which can be correlated 
with pathological conditions, predisposition to disease, drug testing, therapeutic monitoring, 
gene-disease causal linkages, identification of correlates of immunity and physiological 
status, among others. Although the data generated from the assays of the invention is suited 
for manual review and analysis, in a preferred embodiment, prior data processing using high- 
speed computers is utilized. 

An array of methods for indexing and retrieving biomolecular information is 
known in the art. For example, U.S. Patents 6,023,659 and 5,966,712 disclose a relational 
database system for storing biomolecular sequence information in a manner that allows 
sequences to be catalogued and searched according to one or more protein function 
hierarchies. U.S. Patent 5,953,727 discloses a relational database having sequence records 
containing information in a format that allows a collection of partial-length DNA sequences 
to be catalogued and searched according to association with one or more sequencing projects 
for obtaining full-length sequences from the collection of partial length sequences. U.S. 
Patent 5,706,498 discloses a gene database retrieval system for making a retrieval of a gene 
sequence similar to a sequence data item in a gene database based on the degree of similarity 
between a key sequence and a target sequence. U.S. Patent 5,538,897 discloses a method 
using mass spectroscopy fragmentation patterns of peptides to identify amino acid sequences 
in computer databases by comparison of predicted mass spectra with experimentally-derived 
mass spectra using a closeness-of-fit measure. U.S. Patent 5,926,818 discloses a multi- 
dimensional database comprising a functionality for multi-dimensional data analysis 
described as on-line analytical processing (OLAP), which entails the consolidation of 
projected and actual data according to more than one consolidation path or dimension. U.S. 
Patent 5,295,261 reports a hybrid database structure in which the fields of each database 
record are divided into two classes, navigational and informational data, with navigational 
fields stored in a hierarchical topological map which can be viewed as a tree structure or as 
tl " merger of two or more such tree structures. 

The present invention provides a computer database comprising a computer 
and software for storing in computer-retrievable form assay data records cross-tabulated, e.g., 
with data specifying the source of the target-containing sample from which each sequence 
specificity record was obtained. 
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In an exemplary embodiment, at least one of the sources of target-containing 
sample is from a control tissue sample known to be free of pathological disorders. In a 
variation, at least one of the sources is a known pathological tissue specimen, e.g., a 
neoplastic lesion or another tissue specimen to be analyzed for angiogenesis. In another 
5 variation, the assay records cross-tabulate one or more of the following parameters for each 
target species in a sample: (1) a unique identification code, which can include, e.g., a target 
molecular structure and/or characteristic separation coordinate (e.g., electrophoretic 
coordinates); (2) sample source; and (3) absolute and/or relative quantity of the target species 
present in the sample. 

1 0 The invention also provides for the storage and retrieval of a collection of 

target data in a computer data storage apparatus, which can include magnetic disks, optical 
disks, magneto-optical disks, DRAM, SRAM, SGRAM, SDRAM, RDRAM, DDR RAM, 
magnetic bubble memory devices, and other data storage devices, including CPU registers 

"h 

and on-CPU data storage arrays. Typically, the target data records are stored as a bit pattern 

15 in an array of magnetic domains on a magnetizable medium or as an array of charge states or 
transistor gate states, such as an array of cells in a DRAM device (e.g., each cell comprised of 
a transistor and a charge storage area, which may be on the transistor). In one embodiment, 
the invention provides such storage devices, and computer systems built therewith, 
comprising a bit pattern encoding a protein expression fingerprint record comprising unique 

20 identifiers for at least 10 target data records cross-tabulated with target source. 

When the target is a peptide or nucleic acid, the invention preferably provides 
a method for identifying related peptide or nucleic acid sequences, comprising performing a 
computerized comparison between a peptide or nucleic acid sequence assay record stored in 
or retrieved from a computer storage device or database and at least one other sequence. The 

25 comparison can include a sequence analysis or comparison algorithm or computer program 
embodiment thereof (e.g., FASTA, TFASTA, GAP, BESTFIT) and/or the comparison may 
be of the relative amount of a peptide or nucleic acid sequence in a pool of sequences 
determined from a polypeptide or nucleic acid sample of a specimen. 

The invention also preferably provides a magnetic disk, such as an EBM- 

30 compatible (DOS, Windows *i¥indows95/9 8/2000, Windows NT, OS/2) or other format 

(e.g., Linux, SunOS, Solaris, AIX, SCO Unix, VMS, MV, Macintosh, etc.) floppy diskette or 
hard (fixed, Winchester) disk drive, comprising a bit pattern encoding data from an assay of 
the invention in a file format suitable for retrieval and processing in a computerized sequence 
analysis, comparison, or relative quantitation method. 

25 



The invention also provides a network, comprising a plurality of computing 
devices linked via a data link, such as an Ethernet cable (coax or lOBaseT), telephone line, 
ISDN line, wireless network, optical fiber, or other suitable signal tranmission medium, 
whereby at least one network device {e.g., computer, disk array, etc.) comprises a pattern of 
5 magnetic domains {e.g., magnetic disk) and/or charge domains {e.g., an array of DRAM 
cells) composing a bit pattern encoding data acquired from an assay of the invention. 

The invention also provides a method for transmitting assay data that includes 
generating an electronic signal on an electronic communications device, such as a modem, 
ISDN terminal adapter, DSL, cable modem, ATM switch, or the like, wherein the signal 
H10 includes (in native or encrypted format) a bit pattern encoding data from an assay or a 
O database comprising a plurality of assay results obtained by the method of the invention. 
Pf In a preferred embodiment, the invention provides a computer system for 

CP comparing a query target to a database containing an array of data structures, such as an assay 
n result obtained by the method of the invention, and ranking database targets based on the 
: 1 5 degree of identity and gap weight to the target data. A central processor is preferably 
fO initialized to load and execute the computer program for alignment and/or comparison of the 
S assay results. Data for a query target is entered into the central processor via an I/O device. 
F? Execution of the computer program results in the central processor retrieving the assay data 
from the data file, which comprises a binary description of an assay result. 
20 The target data or record and the computer program can be transferred to 

secondary memory, which is typically random access memory {e.g., DRAM, SRAM, 
SGRAM, or SDRAM). Targets are ranked according to the degree of correspondence 
between a selected assay characteristic {e.g., binding to a selected affinity moiety) and the 
same characteristic of the query target and results are output via an I/O device. For example, 
25 a central processor can be a conventional computer {e.g., Intel Pentium, PowerPC, Alpha, 

PA-8000, SPARC, MIPS 4400, MIPS 10000, VAX, etc); a program can be a commercial or 
public domain molecular biology software package {e.g., UWGCG Sequence Analysis 
Software, Darwin); a data file can be an optical or magnetic disk, a data server, a memory 
device {e.g., DRAM, SRAM, SGRAM, SDRAM, EPROM, bubble memory, flash memory, 
30 etc.); an I/O device can be a terminal comprising a vid * > display and a keyboard, a modem, 
an ISDN terminal adapter, an Ethernet port, a punched card reader, a magnetic strip reader, or 
other suitable I/O device. 

The invention also preferably provides the use of a computer system, such as 
that described above, which comprises: (1) a computer; (2) a stored bit pattern encoding a 
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collection of peptide sequence specificity records obtained by the methods of the invention, 
which may be stored in the computer; (3) a comparison target, such as a query target; and (4) 
a program for alignment and comparison, typically with rank-ordering of comparison results 
on the basis of computed similarity values. 

5 

Angiogenesis-associated sequences 

Angiogenesis proteins of the present invention may be classified as secreted 
proteins, transmembrane proteins or intracellular proteins. In one embodiment,the 
angiogenesis protein is an intracellular protein. Intracellular proteins may be found in the 

10 cytoplasm and/or in the nucleus. Intracellular proteins are involved in all aspects of cellular 
function and replication (including, e.g. , signaling pathways); aberrant expression of such 
proteins often results in unregulated or disregulated cellular processes (see, e.g., Molecular 
Biology of the Cell, 3rd Edition, Alberts, Ed., Garland Pub., 1994). For example, many 
intracellular proteins have enzymatic activity such as protein kinase activity, protein 

15 phosphatase activity, protease activity, nucleotide cyclase activity, polymerase activity and 
the like. Intracellular proteins also serve as docking proteins that are involved in organizing 
complexes of proteins, or targeting proteins to various subcellular localizations, and are 
involved in maintaining the structural integrity of organelles. 

An increasingly appreciated concept in characterizing proteins is the presence 

20 in the proteins of one or more motifs for which defined functions have been attributed. In 

addition to the highly conserved sequences found in the enzymatic domain of proteins, highly 
conserved sequences have been identified in proteins that are involved in protein-protein 
interaction. For example, Src-homology-2 (SH2) domains bind tyrosine-phosphorylated 
targets in a sequence dependent manner. PTB domains, which are distinct from SH2 

25 domains, also bind tyrosine phosphorylated targets. SH3 domains bind to proline-rich 

targets. In addition, PH domains, tetratricopeptide repeats and WD domains to name only a 
few, have been shown to mediate protein-protein interactions. Some of these may also be 
involved in binding to phospholipids or other second messengers. As will be appreciated by 
one of ordinary skill in the art, these motifs can be identified on the basis of primary 

30 sequence; thus, an analysis of the sequence of proteins may provide insight intG*»oth the 

enzymatic potential of the molecule and/or molecules with which the protein may associate. 

In another embodiment, the angiogenesis sequences are transmembrane 
proteins. Transmembrane proteins are molecules that span a phospholipid bilayer of a cell. 
They may have an intracellular domain, an extracellular domain, or both. The intracellular 
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domains of such proteins may have a number of functions including those already described 
for intracellular proteins. For example, the intracellular domain may have enzymatic activity 
and/or may serve as a binding site for additional proteins. Frequently the intracellular 
domain of transmembrane proteins serves both roles. For example certain receptor tyrosine 
5 kinases have both protein kinase activity and SH2 domains. In addition, autophosphorylation 
of tyrosines on the receptor molecule itself, creates binding sites for additional SH2 domain 
containing proteins. 

Transmembrane proteins may contain from one to many transmembrane 
domains. For example, receptor tyrosine kinases, certain cytokine receptors, receptor 
10 guanylyl cyclases and receptor serine/threonine protein kinases contain a single 

transmembrane domain. However, various other proteins including channels and adenylyl 
cyclases contain numerous transmembrane domains. Many important cell surface receptors 
such as G protein coupled receptors (GPCRs) are classified as "seven transmembrane 
domain" proteins, as they contain 7 membrane spanning regions. Characteristics of 
15 transmembrane domains include approximately 20 consecutive hydrophobic amino acids that 
may be followed by charged amino acids. Therefore, upon analysis of the amino acid 
sequence of a particular protein, the localization and number of transmembrane domains 
within the protein may be predicted (see, e.g. PSORT web site http://psort.nibb.ac.jp/). 

The extracellular domains of transmembrane proteins are diverse; however, 
20 conserved motifs are found repeatedly among various extracellular domains. Conserved 
structure and/or functions have been ascribed to different extracellular motifs. Many 
extracellular domains are involved in binding to other molecules. In one aspect, extracellular 
domains are found on receptors. Factors that bind the receptor domain include circulating 
ligands, which may be peptides, proteins, or small molecules such as adenosine and the like. 
25 For example, growth factors such as EGF, FGF and PDGF are circulating growth factors that 
bind to their cognate receptors to initiate a variety of cellular responses. Other factors include 
cytokines, mitogenic factors, neurotrophic factors and the like. Extracellular domains also 
bind to cell-associated molecules. In this respect, they mediate cell-cell interactions. Cell- 
associated ligands can be tethered to the cell for example via a glycosylphosphatidylinositol 
30 (GPI) anchor,* * r may themselves be transmembrane proteins. Extracellular domains also 

associate with the extracellular matrix and contribute to the maintenance of the cell structure. 

Angiogenesis proteins that are transmembrane are particularly preferred in the 
present invention as they are readily accessible targets for immunotherapeutics, as are 
described herein. In addition, as outlined below, transmembrane proteins can be also useful 
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in imaging modalities. Antibodies may be used to label such readily accessible proteins in 
situ. Alternatively, antibodies can also label intracellular proteins, in which case samples are 
typically permeablized to provide acess to intracellular proteins. 

It will also be appreciated by those in the art that a transmembrane protein can 
be made soluble by removing transmembrane sequences, for example through recombinant 
methods. Furthermore, transmembrane proteins that have been made soluble can be made to 
be secreted through recombinant means by adding an appropriate signal sequence. 

In another embodiment, the angiogenesis proteins are secreted proteins; the 
secretion of which can be either constitutive or regulated. These proteins have a signal 
peptide or signal sequence that targets the molecule to the secretory pathway. Secreted 
proteins are involved in numerous physiological events; by virtue of their circulating nature, 
they serve to transmit signals to various other cell types. The secreted protein may function in 
an autocrine manner (acting on the cell that secreted the factor), a paracrine manner (acting 
on cells in close proximity to the cell that secreted the factor) or an endocrine manner (acting 
on cells at a distance). Thus secreted molecules find use in modulating or altering numerous 
aspects of physiology. Angiogenesis proteins that are secreted proteins are particularly 
preferred in the present invention as they serve as good targets for diagnostic markers, e.g*., 
for blood or serum tests. 

An angiogenesis sequence is initially identified by substantial nucleic acid 
and/or amino acid sequence homology or linkage to the angiogenesis sequences outlined 
herein. Such homology can be based upon the overall nucleic acid or amino acid sequence, 
and is generally determined as outlined below, using either homology programs or 
hybridization conditions. Typically, linked sequences on a mRNA are found on the same 
molecule. 

As detailed in the definitions, percent identity can be determined using an 
algorithm such as BLAST. A preferred method utilizes the BLASTN module of WU- 
BLAST-2 set to the default parameters, with overlap span and overlap fraction set to 1 and 
0.125, respectively. The alignment may include the introduction of gaps in the sequences to 
be aligned. In addition, for sequences which contain either more or fewer nucleotides than 
those of the nucleic acids of the figure * it is understood that the percentage of homology will 
be determined based on the number of homologous nucleosides in relation to the total number 
of nucleosides. Thus, for example, homology of sequences shorter than those of the 
sequences identified herein and as discussed below, will be determined using the number of 
nucleosides in the shorter sequence. 
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In one embodiment, the nucleic acid homology is determined through 
hybridization studies. Thus, e.g., nucleic acids which hybridize under high stringency to a 
nucleic acidof Table 1, or its complement, or is also found on naturally occurring mRNAs is 
considered an angiogenesis sequence. In another embodiment, less stringent hybridization 
conditions are used; for example, moderate or low stringency conditions may be used, as are 
known in the art; see Ausubel, supra, and Tijssen, supra. 

In addition, the angiogenesis nucleic acid sequences of the invention, e.g, the 
sequence in Table 1, are fragments of larger genes, i.e. they are nucleic acid segments. 
"Genes" in this context includes coding regions, non-coding regions, and mixtures of coding 
and non-coding regions. Accordingly, as will be appreciated by those in the art, using the 
sequences provided herein, extended sequences, in either direction, of the angiogenesis genes 
can be obtained, using techniques well known in the art for cloning either longer sequences or 
the full length sequences; see Ausubel, et ai 9 supra. Much can be done by informatics and 
many sequences can be clustered to include multiple sequences, e.g., systems such as 
UniGene (see, http://www.ncbi.nlm.nih.govAJniGene/). ' . 

Once the angiogenesis nucleic acid is identified, it can be cloned and, if 
necessary, its constituent parts recombined to form the entire angiogenesis nucleic acid 
coding regions or the entire mRNA sequence. Once isolated from its natural source, e.g., 
contained within a plasmid or other vector or excised therefrom as a linear nucleic acid 
segment, the recombinant angiogenesis nucleic acid can be further-used as a probe to identify 
and isolate other angiogenesis nucleic acids, for example extended coding regions. It can 
also be used as a "precursor" nucleic acid to make modified or variant angiogenesis nucleic 
acids and proteins. 

The angiogenesis nucleic acids of the present invention are used in several 
ways. In a first embodiment, nucleic acid probes to the angiogenesis nucleic acids are made 
and attached to biochips to be used in screening and diagnostic methods, as outlined below, 
or for administration, for example for gene therapy, vaccine, and/or antisense applications. 
Alternatively, the angiogenesis nucleic acids that include coding regions of angiogenesis 
proteins can be put into expression vectors for the expression of angiogenesis proteins, again 
for screening purposes or for administration to a patient. & 

In a preferred embodiment, nucleic acid probes to angiogenesis nucleic acids 
(both the nucleic acid sequences outlined in the figures and/or the complements thereof) are 
made. The nucleic acid probes attached to the biochip are designed to be substantially 
complementary to the angiogenesis nucleic acids, i.e. the target sequence (either the target 
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sequence of the sample or to other probe sequences, for example in sandwich assays), such 
that hybridization of the target sequence and the probes of the present invention occurs. As 
outlined below, this complementarity need not be perfect; there may be any number of base 
pair mismatches which will interfere with hybridization between the target sequence and the 
5 single stranded nucleic acids of the present invention. However, if the number of mutations 
is so great that no hybridization can occur under even the least stringent of hybridization 
conditions, the sequence is not a complementary target sequence. Thus, by "substantially 
complementary" herein is meant that the probes are sufficiently complementary to the target 
sequences to hybridize under normal reaction conditions, particularly high stringency 
i AO conditions, as outlined herein. 

O A nucleic acid probe is generally single stranded but can be partially single 

Si and partially double stranded. The strandedness of the probe is dictated by the structure, 
m composition, and properties of the target sequence. In general, the nucleic acid probes range 

'Si' 5 

2} from about 8 to about 100 bases long, with from about 10 to about 80 bases being preferred, 

i.3 

* 15 and from about 30 to about 50 bases being particularly preferred. That is, generally whole 
m genes are not used. In some embodiments, much longer nucleic acids can be used, up to 
h! hundreds of bases. 

q In a preferred embodiment, more than one probe per sequence is used, with 

either overlapping probes or probes to different sections of the target being used. That is, 
20 two, three, four or more probes, with three being preferred, are used to build in a redundancy 
for a particular target. The probes can be overlapping (i.e. have some sequence in common), 
or separate. In some cases, PCR primers may be used to amplify signal for higher sensitivity. 

As will be appreciated by those in the art, nucleic acids can be attached or 
immobilized to a solid support in a wide variety of ways. By "immobilized" and grammatical 
25 equivalents herein is meant the association or binding between the nucleic acid probe and the 
solid support is sufficient to be stable under the conditions of binding, washing, analysis, and 
removal as outlined below. The binding can typically be covalent or non-covalent. By "non- 
covalent binding" and grammatical equivalents herein is meant one or more of electrostatic, 
hydrophilic, and hydrophobic interactions. Included in non-covalent binding is the covalent 
30 * attachment of a molecule, such as, streptavidin to the support and the non-covalent bindim of 
the biotinylated probe to the streptavidin. By "covalent binding" and grammatical 
equivalents herein is meant that the two moieties, the solid support and the probe, are 
attached by at least one bond, including sigma bonds, pi bonds and coordination bonds. 
Covalent bonds can be formed directly between the probe and the solid support or can be 
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formed by a cross linker or by inclusion of a specific reactive group on either the solid 
support or the probe or both molecules. Immobilization may also involve a combination of 
covalent and non-covalent interactions. 

In general, the probes are attached to the biochip in a wide variety of ways, as 
will be appreciated by those in the art. As described herein, the nucleic acids can either be 
synthesized first, with subsequent attachment to the biochip, or can be directly synthesized on 
the biochip. 

The biochip comprises a suitable solid substrate. By "substrate" or "solid 
support" or other grammatical equivalents herein is meant a material that can be modified to 
contain discrete individual sites appropriate for the attachment or association of the nucleic 
acid probes and is amenable to at least one detection method. As will be appreciated by those 
in the art, the number of possible substrates are very large, and include, but are not limited to, 
glass and modified or fimctionalized glass, plastics (including acrylics, polystyrene and 
copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, 
polyurethanes, TeflonJ, etc.), polysaccharides, nylon or nitrocellulose, resins, silica or silica- 
based materials including silicon and modified silicon, carbon, metals, inorganic glasses, 
plastics, etc. In general, the substrates allow optical detection and do not appreciably 
fluorescese. A preferred substrate is described in copending application entitled Reusable 
Low Fluorescent Plastic Biochip, U.S. Application Serial No. 09/270,214, filed March 15, 
1999, herein incorporated by reference in its entirety. 

Generally the substrate is planar, although as will be appreciated by those in 
the art, other configurations of substrates may be used as well. For example, the probes may 
be placed on the inside surface of a tube, for flow-through sample analysis to minimize 
sample volume. Similarly, the substrate may be flexible, such as a flexible foam, including 
closed cell foams made of particular plastics. 

In a preferred embodiment, the surface of the biochip and the probe may be 
derivatized with chemical functional groups for subsequent attachment of the two. Thus, for 
example, the biochip is derivatized with a chemical functional group including, but not 
limited to, amino groups, carboxy groups, oxo groups and thiol groups, with amino groups 
being particularly pref * red. Using these functional groups, the probes can be attached using 
functional groups on the probes. For example, nucleic acids containing amino groups can be 
attached to surfaces comprising amino groups, for example using linkers as are known in the 
art; for example, homo-or hetero-bifiinctional linkers as are well known (see 1994 Pierce 
Chemical Company catalog, technical section on cross-linkers, pages 155-200, incorporated 
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herein by reference). In addition, in some cases, additional linkers, such as alkyl groups 
(including substituted and heteroalkyl groups) may be used. 

In this embodiment, oligonucleotides are synthesized as is known in the art, 
and then attached to the surface of the solid support. As will be appreciated by those skilled 
in the art, either the 5 1 or 3 ! terminus may be attached to the solid support, or attachment may 
be via an internal nucleoside. 

In another embodiment, the immobilization to the solid support may be very 
strong, yet non-covalent. For example, biotinylated oligonucleotides can be made, which 
bind to surfaces covalently coated with streptavidin, resulting in attachment. 

Alternatively, the oligonucleotides may be synthesized on the surface, as is 
known in the art. For example, photoactivation techniques utilizing photopolymerization 
compounds and techniques are used. In a preferred embodiment, the nucleic acids can be 
synthesized in situ, using well known photolithographic techniques, such as those described 
in WO 95/251 16; WO 95/35505; U.S. Patent Nos. 5,700,637 and 5,445,934; and references 
cited within, all of which are expressly incorporated by reference; these methods of 
attachment form the basis of the Affimetrix GeneChip™ technology. 

Often, amplification-based assays are performed to measure the expression 
level of angiogenesis-associated sequences. These assays are typically performed in 
conjunction with reverse transcription. In such assays, an angiogenesis-associated nucleic 
acid sequence acts as a template in an amplification reaction {e.g., Polymerase Chain 
Reaction, or PCR). In a quantitative amplification, the amount of amplification product will 
be proportional to the amount of template in the original sample. Comparison to appropriate 
controls provides a measure of the amount of angiogenesis-associated RNA Methods of 
quantitative amplification are well known to those of skill in the art. Detailed protocols for 
quantitative PCR are provided, e.g., in Innis et ah (1990) PCR Protocols, A Guide to Methods 
and Applications, Academic Press, Inc. N.Y.). 

In some embodiments, a TaqMan based assay is used to measure expression. 
TaqMan based assays use a fluorogenic oligonucleotide probe that contains a 5' fluorescent 
dye and a 3' quenching agent. The probe hybridizes to a PCR product, but cannot itself be 
extended due to a blocking agent at the 3 ' end. i Tien the PCR product is amplified in 
subsequent cycles, the 5' nuclease activity of the polymerase, e.g., AmpliTaq, results in the 
cleavage of the TaqMan probe. This cleavage separates the 5' fluorescent dye and the 3' 
quenching agent, thereby resulting in an increase in fluorescence as a function of 
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amplification (see, for example, literature provided by Perkin-Elmer, e.g., www2.perkin- 
elmer.com). 

Other suitable amplification methods include, but are not limited to, ligase 
chain reaction (LCR) (see, Wu and Wallace (1989) Genomics 4: 560, Landegren et al (1988) 
5 Science U\\ 1077, andBarringer etal (1990) Gene 89: 117), transcription amplification 

(Kwoh ef a/. (1989) Proc. Natl Acad. Set USA 86: 1173), self-sustained sequence replication 
(Guatelli et al (1990) Proc. Nat Acad. Scl USA 87: 1874), dot PCR, and linker adapter PCR, 
etc. 

In a preferred embodiment, angiogenesis nucleic acids, e.g., encoding 
1 0 angiogenesis proteins are used to make a variety of expression vectors to express 

angiogenesis proteins which can then be used in screening assays, as described below. 
Expression vectors and recombinant DNA technology are well known to those of skill in the 
art (see, e.g., Ausubel, supra, and Gene Expression Systems, Fernandez & Hoeffler, Eds, 
Academic Press, 1999) and are used to express proteins. The expression vectors may be 
15 either self-replicating extrachromosomal vectors or vectors which integrate; into a host 
genome. Generally, these expression vectors include transcriptional and translational 
regulatory nucleic acid operably linked to the nucleic acid encoding the angiogenesis protein. 
The term "control sequences" refers to DNA sequences used for the expression of an 
operably linked coding sequence in a particular host organism. Control sequences that are 
20 suitable for prokaryotes, for example, include a promoter, optionally an operator sequence, 
and a ribosome binding site. Eukaryotic cells are known to utilize promoters, 
polyadenylation signals, and enhancers. 

Nucleic acid is "operably linked" when it is placed into a functional 
relationship with another nucleic acid sequence. For example, DNA for a presequence or 
25 secretory leader is operably linked to DNA for a polypeptide if it is expressed as a preprotein 
that participates in the secretion of the polypeptide; a promoter or enhancer is operably linked 
to a coding sequence if it affects the transcription of the sequence; or a ribosome binding site 
is operably linked to a coding sequence if it is positioned so as to facilitate translation. 
Generally, "operably linked" means that the DNA sequences being linked are contiguous, 
30 and, in the case of a secretory leader, contiguous and in reading phase. T * )wever, enhancers 
do not have to be contiguous. Linking is typically accomplished by ligation at convenient 
restriction sites. If such sites do not exist, synthetic oligonucleotide adaptors or linkers are 
used in accordance with conventional practice. Transcriptional and translational regulatory 
nucleic acid will generally be appropriate to the host cell used to express the angiogenesis 
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protein; for example, transcriptional and translational regulatory nucleic acid sequences from 
Bacillus are preferably used to express the angiogenesis protein in Bacillus. Numerous types 
of appropriate expression vectors, and suitable regulatory sequences are known in the art for 
a variety of host cells. 

In general, transcriptional and translational regulatory sequences may include, 
but are not limited to, promoter sequences, ribosomal binding sites, transcriptional start and 
stop sequences, translational start and stop sequences, and enhancer or activator sequences. 
In a preferred embodiment, the regulatory sequences include a promoter and transcriptional 
start and stop sequences. 

Promoter sequences encode either constitutive or inducible promoters. The 
promoters may be either naturally occurring promoters or hybrid promoters. Hybrid 
promoters, which combine elements of more than one promoter, are also known in the art, 
and are useful in the present invention. 

In addition, an expression vector may comprise additional elements. For 
example, the expression vector may have two replication systems, thus allowing it to be 
maintained in two organisms, for example in mammalian or insect cells for expression and in 
a procaryotic host for cloning and amplification. Furthermore, for integrating expression 
vectors, the expression vector contains at least one sequence homologous to the host cell 
genome, and preferably two homologous sequences which flank the expression construct. 
The integrating vector may be directed to a specific locus in the host cell by selecting the 
appropriate homologous sequence for inclusion in the vector. Constructs for integrating 
vectors are well known in the art (e.g., Fernandez & Hoeffler, supra). 

In addition, in a preferred embodiment, the expression vector contains a 
selectable marker gene to allow the selection of transformed host cells. Selection genes are 
well known in the art and will vary with the host cell used. 

The angiogenesis proteins of the present invention are produced by culturing a 
host cell transformed with an expression vector containing nucleic acid encoding an 
angiogenesis protein, under the appropriate conditions to induce or cause expression of the 
angiogenesis protein. Conditions appropriate for angiogenesis protein expression will vary 
with t ' . choice of the expression vector and the host cell, and will be easily ascertained by 
one skilled in the art through routine experimentation or optimization. For example, the use 
of constitutive promoters in the expression vector will require optimizing the growth and 
proliferation of the host cell, while the use of an inducible promoter requires the appropriate 
growth conditions for induction. In addition, in some embodiments, the timing of the harvest 
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is important. For example, the baculoviral systems used in insect cell expression are lytic 
viruses, and thus harvest time selection can be crucial for product yield. 

Appropriate host cells include yeast, bacteria, archaebacteria, fungi, and insect 
and animal cells, including mammalian cells. Of particular interest are Saccharomyces 
cerevisiae and other yeasts, E. coli, Bacillus subtilis, Sf9 cells, CI 29 cells, 293 cells, 
Neurospora, BHK, CHO, COS, HeLa cells, HUVEC (human umbilical vein endothelial 
cells), THP1 cells (a macrophage cell line) and various other human cells and cell lines. 

In a preferred embodiment, the angiogenesis proteins are expressed in 
mammalian cells. Mammalian expression systems are also known in the art, and include 
retroviral and adenoviral systems. Of particular use as mammalian promoters are the 
promoters from mammalian viral genes, since the viral genes are often highly expressed and 
have a broad host range. Examples include the SV40 early promoter, mouse mammary tumor 
virus LTR promoter, adenovirus major late promoter, herpes simplex virus promoter, and the 
CMV promoter (see, e.g., Fernandez & Hoeffler, supra). Typically, transcription termination 
and polyadenylation sequences recognized by mammalian cells are regulatory regions located 
y to the translation stop codon and thus, together with the promoter elements, flank the 
coding sequence. Examples of transcription terminator and polyadenlytion signals include 
those derived form SV40. 

The methods of introducing exogenous nucleic acid into mammalian hosts, as 
well as other hosts, is well known in the art, and will vary with the host cell used. 
Techniques include dextran-mediated transfection, calcium phosphate precipitation, 
polybrene mediated transfection, protoplast fusion, electroporation, viral infection, 
encapsulation of the polynucleotide(s) in liposomes, and direct microinjection of the DNA 
into nuclei. 

In a preferred embodiment, angiogenesis proteins are expressed in bacterial 
systems. Bacterial expression systems are well known in the art. Promoters from 
bacteriophage may also be used and are known in the art. In addition, synthetic promoters 
and hybrid promoters are also useful; for example, the tac promoter is a hybrid of the tip and 
lac promoter sequences. Furthermore, a bacterial promoter can include naturally occurring 
promoters of non-bacterial origin that have the ability to bind bacterial RNA polymerase and 
initiate transcription. In addition to a functioning promoter sequence, an efficient ribosome 
binding site is desirable. The expression vector may also include a signal peptide sequence 
that provides for secretion of the angiogenesis protein in bacteria. The protein is either 
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secreted into the growth media (gram-positive bacteria) or into the periplasmic space, located 
between the inner and outer membrane of the cell (gram-negative bacteria). The bacterial 
expression vector may also include a selectable marker gene to allow for the selection of 
bacterial strains that have been transformed. Suitable selection genes include genes which 
render the bacteria resistant to drugs such as ampicillin, chloramphenicol, erythromycin, 
kanamycin, neomycin and tetracycline. Selectable markers also include biosynthetic genes, 
such as those in the histidine, tryptophan and leucine biosynthetic pathways. These 
components are assembled into expression vectors. Expression vectors for bacteria are well 
known in the art, and include vectors for Bacillus subtilis, E. coli, Streptococcus cremoris, 
and Streptococcus lividans, among others (eg., Fernandez & Hoeffler, supra). The bacterial 
expression vectors are transformed into bacterial host cells using techniques well known in 
the art, such as calcium chloride treatment, electroporation, and others. 

In one embodiment, angiogenesis proteins are produced in insect cells. 
Expression vectors for the transformation of insect cells, and in particular, baculovirus-based 
expression vectors, are well known in the art. ' . 

In a preferred embodiment, angiogenesis protein is produced in yeast cells. 
Yeast expression systems are well known in the art, and include expression vectors for 
Saccharomyces cerevisiae, Candida albicans and C. maltosa, Hansenula polymorpha, 
Kluyveromyces fragilis and K. lactis, Pichia guillerimondii and P. pastoris, 
Schizosaccharomyces pombe, and Yarrowia lipolytica. 

The angiogenesis protein may also be made as a fusion protein, using 
techniques well known in the art. Thus, for example, for the creation of monoclonal 
antibodies, if the desired epitope is small, the angiogenesis protein may be fused to a carrier 
protein to form an immunogen. Alternatively, the angiogenesis protein may be made as a 
fusion protein to increase expression, or for other reasons. For example, when the 
angiogenesis protein is an angiogenesis peptide, the nucleic acid encoding the peptide may be 
linked to other nucleic acid for expression purposes. 

In one embodiment, the angiogenesis nucleic acids, proteins and antibodies of 
the invention are labeled. By "labeled" herein is meant that a compound has at least one 
element, isotope or chemical compound attached to emte e the detection of the compound. In 
general, labels fall into three classes: a) isotopic labels, which may be radioactive or heavy 
isotopes; b) immune labels, which may be antibodies or antigens; and c) colored or 
fluorescent dyes. The labels may be incorporated into the angiogenesis nucleic acids, 
proteins and antibodies at any position. For example, the label should be capable of 
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producing, either directly or indirectly, a detectable signal. The detectable moiety may be a 
radioisotope, such as 3 H, 14 C, 32 P, 35 S, or 125 I, a fluorescent or chemiluminescent compound, 
such as fluorescein isothiocyanate, rhodamine, or luciferin, or an enzyme, such as alkaline 
phosphatase, beta-galactosidase or horseradish peroxidase. Any method known in the art for 
conjugating the antibody to the label may be employed, including those methods described by 
Hunter et aL, Nature, 144:945 (1962); David et al., Biochemistry, 13:1014 (1974); Pain et 
aL, J. Immunol. Meth., 40:219 (1981); andNygren, J. Histochem. and Cytochem., 30:407 
(1982). 

Accordingly, the present invention also provides angiogenesis protein 
sequences. An angiogenesis protein of the present invention may be identified in several 
ways. Protein" in this sense includes proteins, polypeptides, and peptides. As will be 
appreciated by those in the art, the nucleic acid sequences of the invention can be used to 
generate protein sequences. There are a variety of ways to do this, including cloning the 
entire gene and verifying its frame and amino acid sequence, or by comparing it to known 
sequences to search for homology to provide a frame, assuming the angiogenesis protein has 
an identifiable motif or homology to some protein in the database being used. Generally, the 
nucleic acid sequences are input into a program that will search all three frames for 
homology. This is done in a preferred embodiment using the following NCBI Advanced 
BLAST parameters. The program is blastx or blastn. The database is nr. The input data is as 
"Sequence in FASTA format". The organism list is "none". The "expect" is 10; the filter is 
default The "descriptions" is 500, the "alignments" is 500, and the "alignment view" is 
pairwise. The "Query Genetic Codes" is standard (1). The matrix is BLOSUM62; gap 
existence cost is 1 1, per residue gap cost is 1; and the lambda ratio is .85 default This 
results in the generation of a putative protein sequence. 

Also included within one embodiment of angiogenesis proteins are amino acid 
variants of the naturally occurring sequences, as determined herein. Preferably, the variants 
are preferably greater than about 75% homologous to the wild-type sequence, more 
preferably greater than about 80%, even more preferably greater than about 85% and most 
preferably greater than 90%. In some embodiments the homology will be as high as about 93 
to 95 or 98%. As for nucleic acids, homology in this context means sequence s'a lilarity or 
identity, with identity being preferred. This homology will be determined using standard 
techniques well known in the art as are outlined above for the nucleic acid homologies. 

Angiogenesis proteins of the present invention may be shorter or longer than 
the wild type amino acid sequences. Thus, in a preferred embodiment, included within the 
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definition of angiogenesis proteins are portions or fragments of the wild type sequences, 
herein. In addition, as outlined above, the angiogenesis nucleic acids of the invention may be 
used to obtain additional coding regions, and thus additional protein sequence, using 
techniques known in the art. 
5 In a preferred embodiment, the angiogenesis proteins are derivative or variant 

angiogenesis proteins as compared to the wild-type sequence. That is, as outlined more fully 
below, the derivative angiogenesis peptide will often contain at least one amino acid 
substitution, deletion or insertion, with amino acid substitutions being particularly preferred. 
The amino acid substitution, insertion or deletion may occur at any residue within the 
1 0 angiogenesis peptide. 

Also included within one embodiment of angiogenesis proteins of the present 
invention are amino acid sequence variants. These variants typically fall into one or more of 
three classes: substitutional, msertional or deletional variants. These variants ordinarily are 
prepared by site specific mutagenesis of nucleotides in the DNA encoding the angiogenesis 
CP 1 5 protein, using cassette or PCR mutagenesis or other techniques well known in the art, to 

produce DNA encoding the variant, and thereafter expressing the DNA in recombinant cell 
culture as outlined above. However, variant angiogenesis protein fragments having up to 
about 100-150 residues may be prepared by in vitro synthesis using established techniques. 
Amino acid sequence variants are characterized by the predetermined nature of the variation, 
20 a feature that sets them apart from naturally occurring allelic or interspecies variation of the 
angiogenesis protein amino acid sequence. The variants typically exhibit the same qualitative 
biological activity as the naturally occurring analogue, although variants can also be selected 
which have modified characteristics as will be more fully outlined below. 

While the site or region for introducing an amino acid sequence variation is 
25 predetermined, the mutation per se need not be predetermined. For example, in order to 
optimize the performance of a mutation at a given site, random mutagenesis may be 
conducted at the target codon or region and the expressed angiogenesis variants screened for 
the optimal combination of desired activity. Techniques for making substitution mutations at 
predetermined sites in DNA having a known sequence are well known, for example, Ml 3 
30 primer mutagut esis and PCR mutagenesis. Screening of the mutants is done using assays of 
angiogenesis protein activities. 

Amino acid substitutions are typically of single residues; insertions usually 
will be on the order of from about 1 to 20 amino acids, although considerably larger 
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insertions may be tolerated. Deletions range from about 1 to about 20 residues, although in 
some cases deletions may be much larger. 

Substitutions, deletions, insertions or any combination thereof may be used to 
arrive at a final derivative. Generally these changes are done on a few amino acids to 
minimize the alteration of the molecule. However, larger changes may be tolerated in certain 
circumstances. When small alterations in the characteristics of the angiogenesis protein are 
desired, substitutions are generally made in accordance with the amino acid substitution chart 
provided in the definition section. 

Substantial changes in function or immunological identity are made by 
selecting substitutions that are less conservative than those provided in the definition of 
"conservative substitution". For example, substitutions may be made which more 
significantly affect: the structure of the polypeptide backbone in the area of the alteration, for 
example the alpha-helical of beta-sheet structure; the charge or hydrophobicity of the 
molecule at the target site; or the bulk of the side chain. The substitutions which in general 
are expected to produce the greatest changes in the polypeptide's properties are those in 
which (a) a hydrophilic residue, e.g. seryl or threonyl, is substituted for (or by) a hydrophobic 
residue, e.g. leucyl, isoleucyl, phenylalanyl, valyl or alanyl; (b) a cysteine or proline is 
substituted for (or by) any other residue; (c) a residue having an electropositive side chain, 
e.g. lysyl, arginyl, or histidyl, is substituted for (or by) an electronegative residue, e.g. 
glutamyl or aspartyl; or (d) a residue having a bulky side chain, e.g. phenylalanine, is 
substituted for (or by) one not having a side chain, e.g. glycine. 

The variants typically exhibit the same qualitative biological activity and will 
elicit the same immune response as the naturally-occurring analog, although variants also are 
selected to modify the characteristics of the angiogenesis proteins as needed. Alternatively, 
the variant may be designed such that the biological activity of the angiogenesis protein is 
altered. For example, glycosylation sites may be altered or removed. 

Covalent modifications of angiogenesis polypeptides are included within the 
scope of this invention. One type of covalent modification includes reacting targeted amino 
acid residues of an angiogenesis polypeptide with an organic derivatizing agent that is 
capable of reacting with selected side chz ins or the N-or C-terminal residues of an 
angiogenesis polypeptide. Derivatization with bifunctional agents is useful, for instance, for 
crosslinking angiogenesis polypeptides to a water-insoluble support matrix or surface for use 
in the method for purifying anti-angiogenesis polypeptide antibodies or screening assays, as 
is more fully described below. Commonly used crosslinking agents include, e.g., 1,1- 
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bis(diazoacetyl)-2-phenylethane, glutaraldehyde, N-hydroxysuccinimide esters, for example, 
esters with 4-azidosalicylic acid, homobifunctional imidoesters, including disuccinimidyl 
esters such as S^'-dithiobis^uccinimidylpropionate), Afunctional maleimides such as bis-N- 
maleimido-l,8~octane and agents such as methyl-3-[(p-azidophenyl)dithio]propioimidate. 

Other modifications include deamidation of glutaminyl and asparaginyl 
residues to the corresponding glutamyl and aspartyl residues, respectively, hydroxylation of 
proline and lysine, phosphorylation of hydroxyl groups of seryl, threonyl or tyrosyl residues, 
methylation of the y-amino groups of lysine, arginine, and histidine side chains [T.E. 
Creighton, Proteins: Structure and Molecular Properties, W.H. Freeman & Co., San 
Francisco, pp. 79-86 (1983)], acetylation of the N-terminal amine, and amidation of any C- 
terminal carboxyl group. 

Another type of covalent modification of the angiogenesis polypeptide 
included within the scope of this invention comprises altering the native glycosylation pattern 
of the polypeptide. "Altering the native glycosylation pattern" is intended for purposes herein 
to mean deleting one or more carbohydrate moieties found in native sequence angiogenesis 
polypeptide, and/or adding one or more glycosylation sites that are not present in the native 
sequence angiogenesis polypeptide. Glycosylation patterns can be altered in many ways. For 
example the use of different cell types to express angiogenesis-associated sequences can 
result in different glycosylation patterns. 

Addition of glycosylation sites to angiogenesis polypeptides may also be 
accomplished by altering the amino acid sequence thereof. The alteration may be made, for 
example, by the addition of, or substitution by, one or more serine or threonine residues to the 
native sequence angiogenesis polypeptide (for O-linked glycosylation sites). The 
angiogenesis amino acid sequence may optionally be altered through changes at the DNA 
level, particularly by mutating the DNA encoding the angiogenesis polypeptide at preselected 
bases such that codons are generated that will translate into the desired amino acids. 

Another means of increasing the number of carbohydrate moieties on the 
angiogenesis polypeptide is by chemical or enzymatic coupling of glycosides to the 
polypeptide. Such methods are described in the art, e.g., in WO 87/05330 published 1 1 
September 1987, and in Aplin and Wriston, CRC Crit. Rev. Bioc*an., pp. 259-306 (1981). 

Removal of carbohydrate moieties present on the angiogenesis polypeptide 
may be accomplished chemically or enzymatically or by mutational substitution of codons 
encoding for amino acid residues that serve as targets for glycosylation. Chemical 
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deglycosylation techniques are known in the art and described, for instance, by Hakimuddin, 
et aL, Arch. Biochem. Biophys., 259:52 (1987) and by Edge et aL, Anal. Biochem., 118:131 
(1981). Enzymatic cleavage of carbohydrate moieties on polypeptides can be achieved by the 
use of a variety of endo-and exo-glycosidases as described by Thotakura et aL, Meth. 
Enzymol., 138:350(1987). 

Another type of covalent modification of angiogenesis comprises linking the 
angiogenesis polypeptide to one of a variety of nonproteinaceous polymers, e.g., 
polyethylene glycol, polypropylene glycol, or polyoxyalkylenes, in the manner set forth in 
U.S. Patent Nos. 4,640,835; 4,496,689; 4,301,144; 4,670,417; 4,791,192 or 4,179,337. 

Angiogenesis polypeptides of the present invention may also be modified in a 
way to form chimeric molecules comprising an angiogenesis polypeptide fused to another, 
heterologous polypeptide or amino acid sequence. In one embodiment, such a chimeric 
molecule comprises a fusion of an angiogenesis polypeptide with a tag polypeptide which 
provides an epitope to which an anti-tag antibody can selectively bind. The epitope tag is 
generally placed at the amino-or carboxyl-terminus of the angiogenesis polypeptide. The 
presence of such epitope-tagged forms of an angiogenesis polypeptide can be detected using 
an antibody against the tag polypeptide. Also, provision of the epitope tag enables the 
angiogenesis polypeptide to be readily purified by affinity purification using an anti-tag 
antibody or another type of affinity matrix that binds to the epitope tag. In an alternative 
embodiment, the chimeric molecule may comprise a fusion of an angiogenesis polypeptide 
with an immunoglobulin or a particular region of an immunoglobulin. For a bivalent form of 
the chimeric molecule, such a fusion could be to the Fc region of an IgG molecule. 

Various tag polypeptides and their respective antibodies are well known in the 
art. Examples include poly-histidine (poly-his).or poly-histidine-glycine (poly-his-gly) tags; 
HIS6 and metal chelation tags, the flu HA tag polypeptide and its antibody 12CA5 [Field et 
aL, Mol. Cell. Biol, 8:2159-2165 (1988)]; the c-myc tag and the 8F9, 3C7, 6E10, G4, B7 and 
9E10 antibodies thereto [Evan et aL, Molecular and Cellular Biology, 5:3610-3616 (1985)]; 
and the Herpes Simplex virus glycoprotein D (gD) tag and its antibody [Paborsky et aL, 
Protein Engineering, 3(6):547-553 (1990)]. Other tag polypeptides include the Flag-peptide 
„ [Hopp et aL, BioTechnology, 6:1204-1210 (1988)]; the KT3 epitope peptide [Martin et al. A 
Science, 255:192-194 (1992)]; tubulin epitope peptide [Skinner et aL, J. Biol. Chem., 
266:15163-15166 (1991)]; and the T7 gene 10 protein peptide tag [Lutz-Freyermuth et aL, 
Proc. Natl. Acad. Sci. USA, 87:6393-6397 (1990)]. 
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Also included with an embodiment of angiogenesis protein are other 
angiogenesis proteins of the angiogenesis family, and angiogenesis proteins from other 
organisms, which are cloned and expressed as outlined below. Thus, probe or degenerate 
polymerase chain reaction (PCR) primer sequences may be used to find other related 
angiogenesis proteins from humans or other organisms. As will be appreciated by those in 
the art, particularly useful probe and/or PCR primer sequences include the unique areas of the 
angiogenesis nucleic acid sequence. As is generally known in the art, preferred PCR primers 
are from about 15 to about 35 nucleotides in length, with from about 20 to about 30 being 
preferred, and may contain inosine as needed. The conditions for the PCR reaction are well 
known in the art {e.g., Innis, PCR Protocols, supra). 

In addition, as is outlined herein, angiogenesis proteins can be made that are 
longer than those encoded by the nucleic acids of the figures, e.g., by the elucidation of 
extended sequences, the addition of epitope or purification tags, the addition of other fusion 
sequences, etc. 

Angiogenesis proteins may also be identified as being encoded by 
angiogenesis nucleic acids. Thus, angiogenesis proteins are encoded by nucleic acids that 
will hybridize to the sequences of the sequence listings, or their complements, as outlined 
herein. 

In a preferred embodiment, when the angiogenesis protein is to be used to 
generate antibodies, e.g., for immunotherapy or immunodiagnosis, the angiogenesis protein 
should share at least one epitope or determinant with the full length protein. By "epitope" or 
"determinant" herein is typically meant a portion of a protein which will generate and/or bind 
an antibody or T-cell receptor in the context of MHC. Thus, in most instances, antibodies 
made to a smaller angiogenesis protein will be able to bind to the full-length protein, 
particularly linear epitopes. In a preferred embodiment, the epitope is unique; that is, 
antibodies generated to a unique epitope show little or no cross-reactivity. In a preferred 
embodiment, the epitope is selected from a protein sequence set out in Table 2. 

Methods of preparing polyclonal antibodies are known to the skilled artisan 
(e.g., Coligan, supra; and Harlow & Lane, supra). Polyclonal antibodies can be raised in a 
mammal, e.g., by one o& nore injections of an immunizing agent and, if desired, an adjuvant. 
Typically, the immunizing agent and/or adjuvant will be injected in the mammal by multiple 
subcutaneous or intraperitoneal injections. The immunizing agent may include a protein 
encoded by a nucleic acid of the figures or fragment thereof or a fusion protein thereof. It 
may be useful to conjugate the immunizing agent to a protein known to be immunogenic in 
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the mammal being immunized. Examples of such immunogenic proteins include but are not 
limited to keyhole limpet hemocyanin, serum albumin, bovine thyroglobulin, and soybean 
trypsin inhibitor. Examples of adjuvants which may be employed include Freund's complete 
adjuvant and MPL-TDM adjuvant (monophosphoryl Lipid A, synthetic trehalose 
dicorynomycolate). The immunization protocol may be selected by one skilled in the art 
without undue experimentation. 

The antibodies may, alternatively, be monoclonal antibodies. Monoclonal 
antibodies may be prepared using hybridoma methods, such as those described by Kohler and 
Milstein, Nature, 256:495 (1975). In a hybridoma method, a mouse, hamster, or other 
appropriate host animal, is typically immunized with an immunizing agent to elicit 
lymphocytes that produce or are capable of producing antibodies that will specifically bind to 
the immunizing agent. Alternatively, the lymphocytes may be immunized in vitro. The 
immunizing agent will typically include a polypeptide encoded by a nucleic acid of Table 1, 
or fragment thereof, or a fusion protein thereof. Generally, either peripheral blood 
lymphocytes ("PBLs") are used if cells of human origin are desired, or spleen cells or lymph 
node cells are used if non-human mammalian sources are desired. The lymphocytes are then 
fused with an immortalized cell line using a suitable fusing agent, such as polyethylene 
glycol, to form a hybridoma cell [Goding, Monoclonal Antibodies: Principles and Practice, 
Academic Press, (1986) pp. 59-103]. Immortalized cell lines are usually transformed 
mammalian cells, particularly myeloma cells of rodent, bovine and human origin. Usually, 
rat or mouse myeloma cell lines are employed. The hybridoma cells may be cultured in a 
suitable culture medium that preferably contains one or more substances that inhibit the 
growth or survival of the unfused, immortalized cells. For example, if the parental cells lack 
the enzyme hypoxanthine guanine phosphoribosyl transferase (HGPRT or HPRT), the culture 
medium for the hybridomas typically will include hypoxanthine, aminopterin, and thymidine 
("HAT medium"), which substances prevent the growth of HGPRT-deficient cells. 

In one embodiment,' the antibodies are bispecific antibodies. Bispecific 
antibodies are monoclonal, preferably human or humanized, antibodies that have binding 
specificities for at least two different antigens or that have binding specificities for two 
epitopes on the sanr antigen. In one embodiment, one of the binding specificities is for a 
protein encoded by a nucleic acid Table 1 or a fragment thereof, the other one is for any other 
antigen, and preferably for a cell-surface protein or receptor or receptor subunit, preferably 
one that is tumor specific. Alternatively, tetramer-type technology may create multivalent 
reagents. 
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In a preferred embodiment, the antibodies to angiogenesis protein are capable 
of reducing or eliminating a biological function of an angiogenesis protein, as is described 
below. That is, the addition of anti-angiogenesis protein antibodies (either polyclonal or 
preferably monoclonal) to angiogenic tissue (or cells containing angiogenesis) may reduce or 
5 eliminate the angiogenesis activity. Generally, at least a 25% decrease in activity is 

preferred, with at least about 50% being particularly preferred and about a 95-100% decrease 
being especially preferred. 

In a preferred embodiment the antibodies to the angiogenesis proteins are 
humanized antibodies {e.g., Xenerex Biosciences, Mederex, Inc., Abgenix, Inc., Protein 
10 Design Labs,Inc.) Humanized forms of non-human (e.g., murine) antibodies are chimeric 
M: molecules of immunoglobulins, immunoglobulin chains or fragments thereof (such as Fv, 
S Fab, Fab', F(ab')2 or other antigen-binding subsequences of antibodies) which contain 
W minimal sequence derived from non-human immunoglobulin. Humanized antibodies include 
01 human immunoglobulins (recipient antibody) in which residues form a complementary 
Sa5 determining region (CDR) of the recipient are replaced by residues from a CDR of a non- 
s human species (donor antibody) such as mouse, rat or rabbit having the desired specificity, 

fy affinity and capacity. In some instances, Fv framework residues of the human 

immunoglobulin are replaced by corresponding non-human residues. Humanized antibodies 

y 5 

O may also comprise residues which are found neither in the recipient antibody nor in the 
20 imported CDR or framework sequences. In general, a humanized antibody will comprise 
substantially all of at least one, and typically two, variable domains, in which all or 
substantially all of the CDR regions correspond to those of a non-human immunoglobulin 
and all or substantially all of the framework (FR) regions are those of a human 
immunoglobulin consensus sequence. The humanized antibody optimally also will comprise 
25 at least a portion of an immunoglobulin constant region (Fc), typically that of a human 
immunoglobulin [Jones et aL, Nature, 321:522-525 (1986); Riechmann et aL, Nature, 
332:323-329 (1988); and Presta, Curr. Op. Struct. Biol., 2:593-596 (1992)]. 

Methods for humanizing non-human antibodies are well known in the art. 
Generally, a humanized antibody has one or more amino acid residues introduced into it from 
30 a source which is non-human. These non-ht^ ian amino acid residues are often referred to as 
import residues, which are typically taken from an import variable domain. Humanization 
can be essentially performed following the method of Winter and co-workers [Jones et aL, 
Nature, 321:522-525 (1986); Riechmann et aL, Nature, 332:323-327 (1988); Verhoeyen et 
aL, Science, 239:1534-1536 (1988)], by substituting rodent CDRs or CDR sequences for the 
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corresponding sequences of a human antibody. Accordingly, such humanized antibodies are 
chimeric antibodies (U.S. Patent No. 4,816,567), wherein substantially less than an intact 
human variable domain has been substituted by the corresponding sequence from a non- 
human species. In practice, humanized antibodies are typically human antibodies in which 
5 some CDR residues and possibly some FR residues are substituted by residues from 
analogous sites in rodent antibodies. 

Human antibodies can also be produced using various techniques known in the 
art, including phage display libraries [Hoogenboom and Winter, J. Mol. Biol., 227:381 
(1991); Marks et aL, J. Mol. Biol., 222:581 (1991)]. The techniques of Cole et al. and 
10 Boerner et al. are also available for the preparation of human monoclonal antibodies (Cole et 
M= al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, p. 77 (1985) and Boerner et 
□ al., J. Immunol., 147(l):86-95 (1991)]. Similarly, human antibodies can be made by 

introducing of human immunoglobulin loci into transgenic animals, e.g., mice in which the 
P endogenous immunoglobulin genes have been partially or completely inactivated. Upon 
Pi 15 challenge, human antibody production is observed, which closely resembles that seen in 
: humans in all respects, including gene rearrangement, assembly, and antibody repertoire. 

OJ This approach is described, for example, in U.S. Patent Nos. 5,545,807; 5,545,806; 
U 5,569,825; 5,625,126; 5,633,425; 5,661,016, and in the following scientific publications: 

Marks et al., Bio/Technology 10, 779-783 (1992); Lonberg et al., Nature 368 856-859 (1994); 
20 Morrison, Nature 368, 812-13 (1994); Fishwild et al., Nature Biotechnology 14, 845-51 

(1996); Neuberger, Nature Biotechnology 14, 826 (1996); Lonberg and Huszar, Intern. Rev. 

Immunol. 13 65-93 (1995). 

By immunotherapy is meant treatment of angiogenesis with an antibody raised 
against angiogenesis proteins. As used herein, immunotherapy can be passive or active. 

25 Passive immunotherapy as defined herein is the passive transfer of antibody to a recipient 
(patient). Active immunization is the induction of antibody and/or T-cell responses in a 
recipient (patient). Induction of an. immune response is the result of providing the recipient 
with an antigen to which antibodies are raised. As appreciated by one of ordinary skill in the 
art, the antigen may be provided by injecting a polypeptide against which antibodies are 

30 desired to be raised into a recipient, or contacting the recipient with a n *, leic acid capable of 
expressing the antigen and under conditions for expression of the antigen, leading to an 
immune response. 

In a preferred embodiment the angiogenesis proteins against which antibodies 
are raised are secreted proteins as described above. Without being bound by theory, 
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antibodies used for treatment, bind and prevent the secreted protein from binding to its 
receptor, thereby inactivating the secreted angiogenesis protein. 

In another preferred embodiment, the angiogenesis protein to which antibodies 
are raised is a transmembrane protein. Without being bound by theory, antibodies used for 
treatment, bind the extracellular domain of the angiogenesis protein and prevent it from 
binding to other proteins, such as circulating ligands or cell-associated molecules. The 
antibody may cause down-regulation of the transmembrane angiogenesis protein. As will be 
appreciated by one of ordinary skill in the art, the antibody may be a competitive, non- 
competitive or uncompetitive inhibitor of protein binding to the extracellular domain of the 
angiogenesis protein. The antibody is also an antagonist of the angiogenesis protein. 
Further, the antibody prevents activation of the transmembrane angiogenesis protein. In one 
aspect, when the antibody prevents the binding of other molecules to the angiogenesis 
protein, the antibody prevents growth of the cell. The antibody may also be used to target or 
sensitize the cell to cytotoxic agents, including, but not limited to TNF-oc, TNF-p, IL-1, INF-y 
and IL-2, or chemotherapeutic agents including 5FU, vinblastine, actinomycin D, cisplatin, 
methotrexate, and the like. In some instances the antibody belongs to a sub-type that 
activates serum complement when complexed with the transmembrane protein thereby 
mediating cytotoxicity or antigen-dependent cytotoxicity (ADCC). Thus, angiogenesis is 
treated by administering to a patient antibodies directed against the transmembrane 
angiogenesis protein. Antibody-labeling may activate a co-toxin, localize a toxin payload, or 
otherwise provide means to locally ablate cells. 

In another preferred embodiment, the antibody is conjugated to an effector 
moiety. The effector moiety can be any number of molecules, including labelling moieties 
such as radioactive labels qr fluorescent labels, or can be a therapeutic moiety. In one aspect 
the therapeutic moiety is a small molecule that modulates the activity of the angiogenesis 
protein. In another aspect the therapeutic moiety modulates the activity of molecules 
associated with or in close proximity to the angiogenesis protein. The therapeutic moiety 
may inhibit enzymatic activity such as protease or collagenase activity associated with 
angiogenesis. 

•* In a preferred embodiment, the therapeutic moiety can also be a cytotoxic 

agent. In this method, targeting the cytotoxic agent to angiogenesis tissue or cells, results in a 
reduction in the number of afflicted cells, thereby reducing symptoms associated with 
angiogenesis. Cytotoxic agents are numerous and varied and include, but are not limited to, 
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cytotoxic drags or toxins or active fragments of such toxins. Suitable toxins and their 
corresponding fragments include diphtheria A chain, exotoxin A chain, ricin A chain, abrin A 
chain, curcin, crotin, phenomycin, enomycin and the like. Cytotoxic agents also include 
radiochemicals made by conjugating radioisotopes to antibodies raised against angiogenesis 
proteins, or binding of a radionuclide to a chelating agent that has been covalently attached to 
the antibody. Targeting the therapeutic moiety to transmembrane angiogenesis proteins not 
only serves to increase the local concentration of therapeutic moiety in the angiogenesis 
afflicted area, but also serves to reduce deleterious side effects that may be associated with 
the therapeutic moiety. 

In another preferred embodiment, the angiogenesis protein against which the 
antibodies are raised is an intracellular protein. In this case, the antibody may be conjugated 
to a protein which facilitates entry into the cell. In one case, the antibody enters the cell by 
endocytosis. In another embodiment, a nucleic acid encoding the antibody is administered to 
the individual or cell. Moreover, wherein the angiogenesis protein can be targeted within a 
cell, i.e., the nucleus, an antibody thereto contains a signal for that target localization, i.e., a 
nuclear localization signal. 

The angiogenesis antibodies of the invention specifically bind to angiogenesis 
proteins. By "specifically bind" herein is meant that the antibodies bind to the protein with a 
Kd of at least about 0.1 mM, more usually at least about 1 pM, preferably at least about 0.1 
jiM or better, and most preferably, 0.01 (iM or better. Selectivity of binding is also 
important. 

In a preferred embodiment, the angiogenesis protein is purified or isolated 
after expression. Angiogenesis proteins may be isolated or purified in a variety of ways 
known to those skilled in tfre art depending on what other components are present in the 
sample. Standard purification methods include electrophoretic, molecular, immunological 
and chromatographic techniques, including ion exchange, hydrophobic, affinity, and reverse- 
phase HPLC chromatography, and chromatofocusing. For example, the angiogenesis protein 
may be purified using a standard anti-angiogenesis protein antibody column. Ultrafiltration 
and diafiltration techniques, in conjunction with protein concentration, are also useful. For 
general guidance in suitable \*l xification techniques, see Scopes, R., Protein Purification, 
Springer- Verlag, NY (1982). The degree of purification necessary will vary depending on 
the use of the angiogenesis protein. In some instances no purification will be necessary. 
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Once expressed and purified if necessary, the angiogenesis proteins and 
nucleic acids are useful in a number of applications. They may be used as immunoselection 
reagents, as vaccine reagents, as screening agents, etc. 

Detection of angiogenesis sequence for diagnostic and therapeutic applications 

In one aspect, the RNAexpression levels of genes are determined for different 
cellular states in the angiogenesis phenotype. Expression levels of genes in normal tissue 
(i.e., not undergoing angiogenesis) and in angiogenesis tissue (and in some cases, for varying 
severities of angiogenesis that relate to prognosis, as outlined below) are evaluated to provide 
expression profiles. An expression profile of a particular cell state or point of development is 
essentially a "fingerprint" of the state. While two states may have any particular gene 
similarly expressed, the evaluation of a number of genes simultaneously allows the 
generation of a gene expression profile that is reflective of the state of the cell. By comparing 
expression profiles of cells in different states, information regarding which genes are 
important (including both up- and down-regulation of genes) in each of these states is 
obtained. Then, diagnosis may be performed or confirmed to determine whether a tissue 
sample has the gene expression profile of normal or angiogenesic tissue. This will provide 
for molecular diagnosis of related conditions. 

"Differential expression/* or grammatical equivalents as used herein, refers to 
qualitative or quantitative differences in the temporal and/or cellular gene expression 
patterns within and among cells and tissue. Thus, a differentially expressed gene can 
qualitatively have its expression altered, including an activation or inactivation, in, e.g., 
normal versus angiogenic tissue. Genes may be turned on or turned off in a particular state, 
relative to another state thus permitting comparison of two or more statese. A qualitatively 
regulated gene will exhibit an expression pattern within a state or cell type which is 
detectable by standard techniques. Some genes will be expressed in one state or cell type, but 
not in both. Alternatively, the difference in expression may be quantitative, e.g., in that 
expression is increased or decreased; i.e., gene expression is either upregulated, resulting in 
an increased amount of transcript, or downregulated, resulting in a decreased amount of 
transcript. The degree to which expression differs need only be large enough to quantify via 
standard characterization techniques as outlined below, such as by use of Affymetrix 
GeneChip™ expression arrays, Lockhart, Nature Biotechnology, 14:1675-1680 (1996), 
hereby expressly incorporated by reference. Other techniques include, but are not limited to, 
quantitative reverse transcriptase PCR, Northern analysis and RNase protection. As outlined 
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above, preferably the change in expression (i.e., upregulation or downregulation) is at least 
about 50%, more preferably at least about 100%, more preferably at least about 150%, more 
preferably at least about 200%, with from 300 to at least 1000% being especially preferred. 

Evaluation may be at the gene transcript, or the protein level. The amount of 
gene expression may be monitored using nucleic acid probes to the DNA or RNA equivalent 
of the gene transcript, and the quantification of gene expression levels, or, alternatively, the 
final gene product itself (protein) can be monitored, e.g., with antibodies to the angiogenesis 
protein and standard immunoassays (ELIS As, etc.) or other techniques, including mass 
spectroscopy assays, 2D gel electrophoresis assays, etc. Proteins corresponding to 
angiogenesis genes, Le. 9 those identified as being important in an angiogenesis phenotype, 
can be evaluated in an angiogenesis diagnostic test. 

In a preferred embodiment, gene expression monitoring is performed 
simultaneously on a number of genes. Multiple protein expression monitoring can be 
performed as well. Similarly, these assays may be performed on an individual basis as well. 

In this embodiment, the angiogenesis nucleic acid probes' are attached to 
biochips as outlined herein for the detection and quantification of angiogenesis sequences in a 
particular cell. The assays are further described below in the example. PCR techniques can 
be used to provide greater sensitivity. 

In a preferred embodiment nucleic acids encoding the angiogenesis protein are 
detected. Although DNA or RNA encoding the angiogenesis protein may be detected, of 
particular interest are methods wherein an mRNA encoding an angiogenesis protein is 
detected. Probes to detect mRNA can be a nucleotide/deoxynucleotide probe that is 
complementary to and hybridizes with the mRNA and includes, but is not limited to, 
oligonucleotides, cDNA or RNA. Probes also should contain a detectable label, as defined 
herein. In one method the mRNA is detected after immobilizing the nucleic acid to be 
examined on a solid support such as nylon membranes and hybridizing the probe with the 
sample. Following washing to remove the non-specifically bound probe, the label is 
detected. In another method detection of the mRNA is performed in situ. In this method 
permeabilized cells or tissue samples are contacted with a detectably labeled nucleic acid 
probe for sufficient time to allow the probe to hybridize with the target mRNA. * allowing 
washing to remove the non-specifically bound probe, the label is detected. For example a 
digoxygenin labeled riboprobe (RNA probe) that is complementary to the mRNA encoding 
an angiogenesis protein is detected by binding the digoxygenin with an anti-digoxygenin 
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secondary antibody and developed with nitro blue tetrazolium and 5-bromo-4-chloro-3- 
indoyl phosphate. 

In a preferred embodiment, various proteins from the three classes of proteins 
as described herein (secreted, transmembrane or intracellular proteins) are used in diagnostic 

5 assays. The angiogenesis proteins, antibodies, nucleic acids, modified proteins and cells 
containing angiogenesis sequences are used in diagnostic assays. This can be performed on 
an individual gene or corresponding polypeptide level. In a preferred embodiment, the 
expression profiles are used, preferably in conjunction with high throughput screening 
techniques to allow monitoring for expression profile genes and/or corresponding 

10 polypeptides. 

^ As described and defined herein, angiogenesis proteins, including 

2 intracellular, transmembrane or secreted proteins, find use as markers of angiogenesis. 

ftj Detection of these proteins in putative angiogenesis tissue allows for detection or diagnosis of 

S angiogenesis. In one embodiment, antibodies are used to detect angiogenesis proteins. A 

5J 5 preferred method separates proteins from a sample by electrophoresis on a gel (typically a 

5 H 

r denaturing and reducing protein gel, but may be another type of gel, including isoelectric 
hi focusing gels and the like). Following separation of proteins, the angiogenesis protein is 
O detected, e.g. , by immunoblotting with antibodies raised against the angiogenesis protein. 
D Methods of immunoblotting are well known to those of ordinary skill in the art. 
H20 In another preferred method, antibodies to the angiogenesis protein find use in 

in situ imaging techniques, e.g., in histology (e.g., Methods in Cell Biology: Antibodies in 
Cell Biology, volume 37 (Asai, ed. 1993)). In this method cells are contacted with from one 
to many antibodies to the angiogenesis protein(s). Following washing to remove non-specific 
antibody binding, the presence of the antibody or antibodies is detected. In one embodiment 
25 the antibody is detected by incubating with a secondary antibody that contains a detectable 
label. In another method the primary antibody to the angiogenesis protein(s) contains a 
detectable label, for example an enzyme marker that can act on a substrate. In another 
preferred embodiment each one of multiple primary antibodies contains a distinct and 
detectable label. This method finds particular use in simultaneous screening for a plurality of 
30 angiogenesi:* )roteins. As will be appreciated by one of ordinary skill in the art, many other 
histological imaging techniques are alsoprovided by the invention. 

In a preferred embodiment the label is detected in a fluorometer which has the 
ability to detect and distinguish emissions of different wavelengths. In addition, a 
fluorescence activated cell sorter (FACS) can be used in the method. 
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In another preferred embodiment, antibodies find use in diagnosing 
angiogenesis from blood samples. As previously described, certain angiogenesis proteins are 
secreted/circulating molecules. Blood samples, therefore, are useful as samples to be probed 
or tested for the presence of secreted angiogenesis proteins. Antibodies can be used to detect 
an angiogenesis protein by previously described immunoassay techniques including ELIS A, 
immunoblotting (Western blotting), immunoprecipitation, BIACORE technology and the 
like. Conversely, the presence of antibodies may indicate an immune response against an 
endogenous angiogenesis protein. 

In a preferred embodiment, in situ hybridization of labeled angiogenesis 
nucleic acid probes to tissue arrays is done. For example, arrays of tissue samples, including 
angiogenesis tissue and/or normal tissue, are made. In situ hybridization {see, e.g. 9 Ausubel, 
supra) is then performed. When comparing the fingerprints between an individual and a 
standard, the skilled artisan can make a diagnosis, a prognosis, or a prediction based on the 
findings. It is further understood that the genes which indicate the diagnosis may differ from 
those which indicate the prognosis and molecular profiling of the condition of the cells may 
lead to distinctions between responsive or refractory conditions or may be predictive of 
outcomes. 

In a preferred embodiment, the angiogenesis proteins, antibodies, nucleic 
acids, modified proteins and cells containing angiogenesis sequences are used in prognosis 
assays. As above, gene expression profiles can be generated that correlate to angiogenesis 
severity, in terms of long term prognosis. Again, this may be done on either a protein or gene 
level, with the use of genes being preferred. As above, angiogenesis probes may be attached 
to biochips for the detection and quantification of angiogenesis sequences in a tissue or 
patient. The assays proceed as outlined above for diagnosis. PCR method may provide more 
sensitive and accurate quantification. 

In a preferred embodiment members of the three classes of proteins as 
described herein are used in drug screening assays. The angiogenesis proteins, antibodies, 
nucleic acids, modified proteins and cells containing angiogenesis sequences are used in drug 
screening assays or by evaluating the effect of drug candidates on a "gene expression profile" 
or expression profile of polypeptides. * i a preferred embodiment, the expression profiles are 
used, preferably in conjunction with high throughput screening techniques to allow 
monitoring for expression profile genes after treatment with a candidate agent (e.g., 
Zlokarnik, et aL, Science 279, 84-8 (1998); Heid, Genome Res 6:986-94, 1996). 
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In a preferred embodiment, the angiogenesis proteins, antibodies, nucleic 
acids, modified proteins and cells containing the native or modified angiogenesis proteins are 
used in screening assays. That is, the present invention provides novel methods for screening 
for compositions which modulate the angiogenesis phenotype or an identified physiological 
function of an angiogenesis protein. As above, this can be done on an individual gene level 
or by evaluating the effect of drug candidates on a "gene expression profile". In a preferred 
embodiment, the expression profiles are used, preferably in conjunction with high throughput 
screening techniques to allow monitoring for expression profile genes after treatment with a 
candidate agent, see Zlokarnik, supra. 

Having identified the differentially expressed genes herein, a variety of assays 
may be executed. In a preferred embodiment, assays may be run on an individual gene or 
protein level. That is, having identified a particular gene as up regulated in angiogenesis, test 
compounds can be screened for the ability to modulate gene expression or for binding to the 
angiogenic protein. "Modulation" thus includes both an increase and a decrease in gene 
expression. The preferred amount of modulation will depend on the original change of the 
gene expression in normal versus tissue undergoing angiogenesis, with changes of at least 
10%, preferably 50%, more preferably 100-300%, and in some embodiments 300-1000% or 
greater. Thus, if a gene exhibits a 4-fold increase in angiogenic tissue compared to normal 
tissue, a decrease of about four-fold is often desired; similarly, a 10-fold decrease in 
angiogenic tissue compared to normal tissue often provides a target value of a 10-fold 
increase in expression to be induced by the test compound. 

The amount of gene expression may be monitored using nucleic acid probes 
and the quantification of gene expression levels, or, alternatively, the gene product itself can 
be monitored, e.g., through the use of antibodies to the angiogenesis protein and standard 
immunoassays. Proteomics and separation techniques may also allow quantification of 
expression. 

In a preferred embodiment, gene expression or protein monitoring of a number 
of entitites, i.e., an expression profile, is monitored simultaneously. Such profiles will 
typically invove a plurality of those entitites described herein.. 

In this embodiment, the angiogenesis nucleic aci * probes are attached to 
biochips as outlined herein for the detection and quantification of angiogenesis sequences in a 
particular cell. Alternatively, PCR may be used. Thus, a series, e.g., of microtiter plate, may 
be used with dispensed primers in desired wells. A PCR reaction can then be performed and 
analyzed for each well. 
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Modulators of angiogenesis 

Expression monitoring can be performed to identify compounds that modify 
the expression of one or more angiogenesis-associated sequences, e.g., a polynucleotide 
sequence set out in Table 1 . Generally, in a preferred embodiment, a test modulator is added 
to the cells prior to analysis. Moreover, screens are also provided to identify agents that 
modulate angiogenesis, modulate angiogenesis proteins, bind to an angiogenesis protein, or 
interfere with the binding of an angiogenesis protein and an antibody or other binding 
partner. 

The term "test compound" or "drug candidate" or "modulator" or grammatical 
equivalents as used herein describes any molecule, e.g., protein, oligopeptide, small organic 
molecule, polysaccharide, polynucleotide, etc., to be tested for the capacity to directly or 
indirectly alter the angiogenesis phenotype or the expression of an angiogenesis sequence, 
e.g., a nucleic acid or protein sequence. In preferred embodiments, modulators alter 
expression profiles, or expression profile nucleic acids or proteins provided herein. In one 
embodiment, the modulator suppresses an angiogenesis phenotype, for example to a normal 
tissue fingerprint. In another embodiment, a modulator induced an angiogenesis phenotype. 
Generally, a plurality of assay mixtures are run in parallel with different agent concentrations 
to obtain a differential response to the various concentrations. Typically, one of these 
concentrations serves as a negative control, i.e., at zero concentration or below the level of 
detection. 

In one aspect, a modulator will neutralize the effect of an angiogenesis protein. 
By "neutralize" is meant that activity of a protein is inhibited or blocked and thereby has 
substantially no effect on a,cell. 

In certain embodiments, combinatorial libraries of potential modulators will be 
screened for an ability to bind to an angiogenesis polypeptide or to modulate activity. 
Conventionally, new chemical entities with useful properties are generated by identifying a 
chemical compound (called a "lead compound") with some desirable property or activity, 
e.g., inhibiting activity, creating variants of the lead compound, and evaluating the property 
and activity of those variant compounds. Often, high throughput screening (HTS) metho d 
are employed for such an analysis. 

In one preferred embodiment, high throughput screening methods involve 
providing a library containing a large number of potential therapeutic compounds (candidate 
compounds). Such "combinatorial chemical libraries" are then screened in one or more 
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assays to identify those library members (particular chemical species or subclasses) that 
display a desired characteristic activity. The compounds thus identified can serve as 
conventional "lead compounds" or can themselves be used as potential or actual therapeutics. 

A combinatorial chemical library is a collection of diverse chemical 
compounds generated by either chemical synthesis or biological synthesis by combining a 
number of chemical "building blocks" such as reagents. For example, a linear combinatorial 
chemical library, such as a polypeptide (e.g., mutein) library, is formed by combining a set of 
chemical building blocks called amino acids in every possible way for a given compound 
length (i.e., the number of amino acids in a polypeptide compound). Millions of chemical 
compounds can be synthesized through such combinatorial mixing of chemical building 
blocks (Gallop et al. (1994) Med. Chem. 37(9): 1233-1251). 

Preparation and screening of combinatorial chemical libraries is well known to 
those of skill in the art. Such combinatorial chemical libraries include, but are not limited to, 
peptide libraries (see, e.g., U.S. Patent No. 5,010,175, Furka (1991) Int. J. Pept. Prot. Res., 
37: 487-493, Houghton et al (1991) Nature, 354: 84-88), peptoids (PCT Publication No WO 
91/19735, 26 Dec. 1991), encoded peptides (PCT Publication WO 93/20242, 14 Oct 1993), 
random bio-oligomers (PCT Publication WO 92/00091, 9 Jan. 1992), benzodiazepines (U.S. 
Pat. No. 5,288,514), diversomers such as hydantoins, benzodiazepines and dipeptides (Hobbs 
et al, (1993) Proc. Nat. Acad. Sci. USA 90: 6909-6913), vinylogous polypeptides (Hagihara 
et al (1992) J. Amer. Chem. Soc. 1 14: 6568), nonpeptidal peptidomimetics with a Beta-D- 
Glucose scaffolding (Hirschmann et al, (1992) J. Amer. Chem. Soc. 114: 9217-9218), 
analogous organic syntheses of small compound libraries (Chen et al. (1994) J. Amer. Chem. 
Soc. 116: 2661), oligocarbamates (Cho, et al., (1993) Science 261:1303), and/or peptidyl 
phosphonates (Campbell et t al, (1994) J. Org. Chem. 59: 658). See, generally, Gordon et al, 
(1994) J. Med. Chem. 37:1385, nucleic acid libraries (see, e.g., Strategene, Corp.), peptide 
nucleic acid libraries (see, e.g., U.S. Patent 5,539,083), antibody libraries (see, e.g., Vaughn 
et al (1996) Nature Biotechnology, 14(3): 309-314), and PCT/US96/1 0287), carbohydrate 
libraries (see, e.g, Liang et al, (1996) Science, 274: 1520-1522, and U.S. Patent No. 
5,593,853), and small organic molecule libraries (see, e.g., benzodiazepines, Baum (1993) 
C&EN, Jan 18, page isoprenoids, U.S. Patent No. 5,569,588; thiazolidinones and 
metathiazanones, U.S. Patent No. 5,549,974; pyrrolidines, U.S. Patent Nos. 5,525,735 and 
5,519,134; morpholino compounds, U.S. Patent No. 5,506,337; benzodiazepines, U.S. Patent 
No. 5,288,514; and the like). 
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Devices for the preparation of combinatorial libraries are commercially 
available (see, e.g., 357 MPS, 390 MPS, Advanced Chem Tech, Louisville KY, Symphony, 
Rainin, Woburn, MA, 433 A Applied Biosystems, Foster City, CA, 9050 Plus, Millipore, 
Bedford, MA). 

A number of well known robotic systems have also been developed for 
solution phase chemistries. These systems include automated workstations like the 
automated synthesis apparatus developed by Takeda Chemical Industries, LTD. (Osaka, 
Japan) and many robotic systems utilizing robotic arms (Zymate II, Zymark Corporation, 
Hopkinton, Mass.; Orca, Hewlett-Packard, Palo Alto, Calif.), which mimic the manual 
synthetic operations performed by a chemist. Any of the above devices are suitable for use 
with the present invention. The nature and implementation of modifications to these devices 
(if any) so that they can operate as discussed herein will be apparent to persons skilled in the 
relevant art. In addition, numerous combinatorial libraries are themselves commercially 
available (see, e.g., ComGenex, Princeton, N.J., Asinex, Moscow, Ru, Tripos, Inc., St. Louis, 
MO, ChemStar, Ltd, Moscow, RU, 3D Pharmaceuticals, Exton, PA, Maftek Biosciences, 

Columbia, MD, etc.). 

The assays to identify modulators are amenable to high throughput screening. 
Preferred assays thus detect enhancement or inhibition of angiogenesis gene transcription, 
inhibition or enhancement of polypeptide expression, and inhibition or enhancement of 

polypeptide activity. 

High throughput assays for the presence, absence, quantification, or other 
properties of particular nucleic acids or protein products are well known to those of skill in 
the art. Similarly, binding assays and reporter gene assays are similarly well known. Thus, 
for example, U.S. Patent No. 5,559,410 discloses high throughput screening methods for 
proteins, U.S. Patent No. 5,585,639 discloses high throughput screening methods for nucleic 
acid binding (i.e., in arrays), while U.S. Patent Nos. 5,576,220 and 5,541,061 disclose high 
throughput methods of screening for ligand/antibody binding. 

In addition, high throughput screening systems are commercially available 
(see, e.g., Zymark Corp., Hopkinton, MA; Air Technical Industries, Mentor, OH; Beckman 
Instruments, Inc. Fullerton, CA; Precision Syst^ .is, Inc., Natick, MA, etc.). These systems 
typically automate entire procedures, including all sample and reagent pipetting, liquid 
dispensing, timed incubations, and final readings of the microplate in detector(s) appropriate 
for the assay. These configurable systems provide high throughput and rapid start up as well 
as a high degree of flexibility and customization. The manufacturers of such systems provide 
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detailed protocols for various high throughput systems. Thus, for example, Zymark Corp. 
provides technical bulletins describing screening systems for detecting the modulation of 
gene transcription, ligand binding, and the like. 

In one embodiment, modulators are proteins, often naturally occurring 
5 proteins or fragments of naturally occurring proteins. Thus, e.g., cellular extracts containing 
proteins, or random or directed digests of proteinaceous cellular extracts, may be used. In 
this way libraries of proteins may be made for screening in the methods of the invention. 
Particularly preferred in this embodiment are libraries of bacterial, fungal, viral, and 
mammalian proteins, with the latter being preferred, and human proteins being especially 
ilO preferred. Paticularly useful test compound will be directed to the class of proteins to which 
t! the target belongs, e.g., substrates for enzymes or ligands and receptors, 
ril In a preferred embodiment, modulators are peptides of from about 5 to about 

5 30 amino acids, with from about 5 to about 20 amino acids being preferred, and from about 7 
J to about 1 5 being particularly preferred. The peptides may be digests of naturally occurring 
- 1 5 proteins as is outlined above, random peptides, or "biased" random peptides. By 

"randomized" or grammatical equivalents herein is meant that each nucleic acid and peptide 
consists of essentially random nucleotides and amino acids, respectively. Since generally 
these random peptides (or nucleic acids, discussed below) are chemically synthesized, they 
may incorporate any nucleotide or amino acid at any position. The synthetic process can be 
20 designed to generate randomized proteins or nucleic acids, to allow the formation of all or 
most of the possible combinations over the length of the sequence, thus forming a library of 
randomized candidate bioactive proteinaceous agents. 

In one embodiment, the library is fully randomized, with no sequence 
preferences or constants at any position. In a preferred embodiment, the library is biased. 
25 That is, some positions within the sequence are either held constant, or are selected from a 
limited number of possibilities. For example, in a preferred embodiment, the nucleotides or 
amino acid residues are randomized within a defined class, for example, of hydrophobic 
amino acids, hydrophilic residues, sterically biased (either small or large) residues, towards 
the creation of nucleic acid binding domains, the creation of cysteines, for cross-linking, 
30 prolines for SH-3 domains, serines, threonines, tyrosines or histidines for ^losphorylation 

sites, etc., or to purines, etc. 

Modulators of angiogenesis can also be nucleic acids, as defined above. 

As described above generally for proteins, nucleic acid modulating agents may 
be naturally occurring nucleic acids, random nucleic acids, or "biased" random nucleic acids. 
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For example, digests of procaryotic or eucaryotic genomes may be used as is outlined above 
for proteins. 

In a preferred embodiment, the candidate compounds are organic chemical 
moieties, a wide variety of which are available in the literature. 

After the candidate agent has been added and the cells allowed to incubate for 
some period of time, the sample containing a target sequence to be analyzed is added to the 
biochip. If required, the target sequence is prepared using known techniques. For example, 
the sample may be treated to lyse the cells, using known lysis buffers, electroporation, etc., 
with purification and/or amplification such as PCR performed as appropriate. For example, 
an in vitro transcription with labels covalently attached to the nucleotides is performed. 
Generally, the nucleic acids are labeled with biotin-FITC or PE, or with cy3 or cy5. 

In a preferred embodiment, the target sequence is labeled with, for example, a 
fluorescent, a chemiluminescent, a chemical, or a radioactive signal, to provide a means of 
detecting the target sequence's specific binding to a probe. The label also can be an enzyme, 
such as, alkaline phosphatase or horseradish peroxidase, which when provided with an 
appropriate substrate produces a product that can be detected. Alternatively, the label can be 
a labeled compound or small molecule, such as an enzyme inhibitor, that binds but is not 
catalyzed or altered by the enzyme. The label also can be a moiety or compound, such as, an 
epitope tag or biotin which specifically binds to streptavidin. For the example of biotin, the 
streptavidin is labeled as described above, thereby, providing a detectable signal for the 
bound target sequence. Unbound labeled streptavidin is typically removed prior to analysis. 

As will be appreciated by those in the art, these assays can be direct 
hybridization assays or can comprise "sandwich assays", which include the use of multiple 
probes, as is generally outlined in U.S. Patent Nos. 5,681,702, 5,597,909, 5,545,730, 

5.594.117, 5,591,584, 5,571,670, 5,580,731, 5,571,670, 5,591,584, 5,624,802, 5,635,352, 

5.594.1 18, 5,359,100, 5,124,246 and 5,681,697, all of which are hereby incorporated by 
reference. In this embodiment, in general, the target nucleic acid is prepared as outlined 
above, and then added to the biochip comprising a plurality of nucleic acid probes, under 
conditions that allow the formation of a hybridization complex. 

A variety of hybridization conditions may be used in the present invention, 
including high, moderate and low stringency conditions as outlined above. The assays are 
generally run under stringency conditions which allows formation of the label probe 
hybridization complex only in the presence of target. Stringency can be controlled by 
altering a step parameter that is a thermodynamic variable, including, but not limited to, 
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temperature, formamide concentration, salt concentration, chaotropic salt concentration pH, 

organic solvent concentration, etc. 

These parameters may also be used to control non-specific binding, as is 
generally outlined in U.S. Patent No. 5,681,697. Thus it may be desirable to perform certain 
steps at higher stringency conditions to reduce non-specific binding. 

The reactions outlined herein may be accomplished in a variety of ways. 
Components of the reaction may be added simultaneously, or sequentially, in different orders, 
with preferred embodiments outlined below. In addition, the reaction may include a variety 
of other reagents. These include salts, buffers, neutral proteins, e.g. albumin, detergents, etc. 
\0 which may be used to facilitate optimal hybridization and detection, and/or reduce non- 
specific or background interactions. Reagents that otherwise improve the efficiency of the 
assay, such as protease inhibitors, nuclease inhibitors, anti-microbial agents, etc., may also be 
Jr used as appropriate, depending on the sample preparation methods and purity of the target, 
if i The assay data are analyzed to determine the expression levels, and changes in 

A 5 expression levels as between states, of individual genes, forming a gene expression profile. 

Screens are performed to identify modulators of the angiogenesis phenotype. 
In one embodiment, screening is performed to identify modulators that can induce or 
suppress a particular expression profile, thus preferably generating the associated phenotype. 
In another embodiment, e.g., for diagnostic applications, having identified differentially 
20 expressed genes important in a particular state, screens can be performed to identify 

modulators that alter expression of individual genes. In an another embodiment, screening is 
performed to identify modulators that alter a biological function of the expression product of 
a differentially expressed gene. Again, having identified the importance of a gene in a 
particular state, screens are performed to identify agents that bind and/or modulate the 
25 biological activity of the gene product. 

In addition screens can be done for genes that are induced in response to a 
candidate agent. After identifying a modulator based upon its ability to suppress an 
angiogenesis expression pattern leading to a normal expression pattern, or to modulate a 
single angiogenesis gene expression profile so as to mimic the expression of the gene from 
30 normal tissue, a screen as descried above can be performed to identify genes that are 

specifically modulated in response to the agent. Comparing expression profiles between 
normal tissue and agent treated angiogenesis tissue reveals genes that are not expressed in 
normal tissue or angiogenesis tissue, but are expressed in agent treated tissue. These agent- 
specific sequences can be identified and used by methods described herein for angiogenesis 
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genes or proteins. In particular these sequences and the proteins they encode find use in 
marking or identifying agent treated cells. In addition, antibodies can be raised against the 
agent induced proteins and used to target novel therapeutics to the treated angiogenesis tissue 
sample. 

Thus, in one embodiment, a test compound is administered to a population of 
angiogenic cells, that have an associated angiogenesis expression profile. By 
"administration" or "contacting" herein is meant that the candidate agent is added to the cells 
in such a manner as to allow the agent to act upon the cell, whether by uptake and 
intracellular action, or by action at the cell surface. In some embodiments, nucleic acid 
ftp encoding a proteinaceous candidate agent (i.e., a peptide) may be put into a viral construct 
such as an adenoviral or retroviral construct, and added to the cell, such that expression of 
the peptide agent is accomplished, e.g., PCT US97/01019. Regulable gene therapy systems 
can also be used. 

Once the test compound has been administered to the cells, the cells can be 
f 5 washed if desired and are allowed to incubate under preferably physiological conditions for 
some period of time. The cells are then harvested and a new gene expression profile is 
generated, as outlined herein. 
1 Thus, for example, angiogenesis tissue may be screened for agents that 

modulate, e.g., induce or suppress the angiogenesis phenotype. A change in at least one 
20 gene, preferably many, of the expression profile indicates that the agent has an effect on 

angiogenesis activity. By defining such a signature for the angiogenesis phenotype, screens 
for new drugs that alter the phenotype can be devised. With this approach, the drug target 
need not be known and need not be represented in the original expression screening platform, 
nor does the level of transcript for the target protein need to change. 
25 Measure of angiogenesis polypeptide activity, or of angiogenesis or the 

angiogenic phenotype can be performed using a variety of assays. For example, the effects of 
the test compounds upon the function of the anagiogenesis polypeptides can be measured by 
examining parameters described above. A suitable physiological change that affects activity 
can be used to assess the influence of a test compound on the polypeptides of this invention. 
30 When the functional consequences are determined using fcfiact cells or animals, one can also 
measure a variety of effects such as, in the case of angiogenesis associated with tumors, 
tumor growth, neovascularization, hormone release, transcriptional changes to both known 
and uncharacterized genetic markers (e.g., northern blots), changes in cell metabolism such as 
cell growth or pH changes, and changes in intracellular second messengers such as cGMP. In 
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the assays of the invention, mammalian angiogenesis polypeptide is typically used, e.g., 

mouse, preferably human. 

A variety of angiogenesis assays are known to those of skill in the art. Various 
models have been employed to evaluate angiogenesis {e.g., Croix et ah, Science 289:1 197- 
1202, 2000 and Kahn et al, Amer. J. Pathol. 156:1887-1900). Assessement of angiogenesis 
in the presence of a potential modulator of angiogenesis can be performed using cell-cultre- 
based angiogenesis assays, e.g., endothelial cell tube formation assays, as well as other 
bioassays such as the chick CAM assay, the mouse corneal assay, and assays measuring the 
effect of administering potential modulators on implanted tumors. The chick CAM assay is 
described by O'Reilly, et al Cell 79: 315-328, 1994. Briefly, 3 day old chicken embryos with 
intact yolks are separated from the egg and placed in a petri dish. After 3 days of incubation, 
a methylcellulose disc containing the protein to be tested is applied to the CAM of individual 
embryos. After about 48 hours of incubation, the embryos and CAMs are observed to 
determine whether endothelial growth has been inhibited. The mouse corneal assay involves 
implanting a growth factor-containing pellet, along with another pellet containing the 
suspected endothelial growth inhibitor, in the cornea of a mouse and observing the pattern of 
capillaries that are elaborated in the cornea. Angiogenesis can also be measured by 
determining the extent of neovascularization of a tumor. For example, carcinoma cells can be 
subcutaneously inoculated into athymic nude mice and tumor growth then monitored. The 
cancer cells are treated with an angiogenesis inhibitor, such as an antibody, or other 
compound that is exogenously administered, or can be transfected prior to inoculation with a 
polynucleotide inhibitor of angiogenesis. Immunoassays using endothelial cell-specific 
antibodies are typically used to stain for vascularization of tumor and the number of vessels 
in the tumor. 

Assays to identify compounds with modulating activity can be performed in 
vitro. For example, an angiogenesis polypeptide is first contacted with a potential modulator 
and incubated for a suitable amount, of time, e.g., from 0.5 to 48 hours. In one embodiment, 
the angiogenesis polypeptide levels are determined in vitro by measuring the level of protein 
or mRNA. The level of protein is measured using immunoassays such as western blotting, 
ELISA and the like with an antibody that selectively binds to the angiogenesis poly* sptide or 
a fragment thereof. For measurement of mRNA, amplification, e.g., using PCR, LCR, or 
hybridization assays, e.g., northern hybridization, RNAse protection, dot blotting, are 
preferred. The level of protein or mRNA is detected using directly or indirectly labeled 
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detection agents, e.g., fluorescently or radioactively labeled nucleic acids, radioactively or 
enzymatically labeled antibodies, and the like, as described herein. 

Alternatively, a reporter gene system can be devised using the angiogenesis 
protein promoter operably linked to a reporter gene such as luciferase, green fluorescent 
protein, CAT, or |3-gal. The reporter construct is typically transfected into a cell. After 
treatment with a potential modulator, the amount of reporter gene transcription, translation, or 
activity is measured according to standard techniques known to those of skill in the art. 

In a preferred embodiment, as outlined above, screens may be done on 
individual genes and gene products (proteins). That is, having identified a particular 
differentially expressed gene as important in a particular state, screening of modulators of the 
expression of the gene or the gene product itself can be done. The gene products of 
differentially expressed genes are sometimes referred to herein as "angiogenesis proteins". In 
preferred embodiments the angiogenesis protein comprises a sequence shown in Table 2. 
The angiogenesis protein may be a fragment, or alternatively, be the full length protein to a 

fragment shown herein. ' • 

Preferably, the angiogenesis protein is a fragment of approximately 14 to 24 
amino acids long. More preferably the fragment is a soluble fragment. In one embodiment 
an angiogenesis protein is conjugated to an immunogenic agent or BSA. 

In one embodiment, screening for modulators of expression of specific genes 
is performed. Typically, the expression of only one or a few genes are evaluated. In another 
embodiment, screens are designed to first find compounds that bind to differentially 
expressed proteins. These compounds are then evaluated for the ability to modulate 
differentially expressed activity. Moreover, once initial candidate compounds are identified, 
variants can be further screened to better evaluate strucutre activity relationships. 

In a preferred embodiment, binding assays are done. In general, purified or 
isolated gene product is used; that is, the gene products of one or more differentially 
expressed nucleic acids are made. For example, antibodies are generated to the protein gene 
products, and standard immunoassays are run to determine the amount of protein present. 
Alternatively, cells comprising the angiogenesis proteins can be used in the assays. 

Tb*s, in a preferred embodiment, the methods comprise combining an 
angiogenesis protein and a candidate compound, and determining the binding of the 
compound to the angiogenesis protein. Preferred embodiments utilize the human 
angiogenesis protein, although other mammalian proteins may also be used, for example for 



62 



the development of animal models of human disease. In some embodiments, as outlined 
herein, variant or derivative angiogenesis proteins may be used. 

Generally, in a preferred embodiment of the methods herein, the angiogenesis 
protein or the candidate agent is non-diffusably bound to an insoluble support having isolated 
sample receiving areas (e.g. a microtiter plate, an array, etc.). The insoluble supports may be 
made of any composition to which the compositions can be bound, is readily separated from 
soluble material, and is otherwise compatible with the overall method of screening. The 
surface of such supports may be solid or porous and of any convenient shape. Examples of 
suitable insoluble supports include microtiter plates, arrays, membranes and beads. These are 
typically made of glass, plastic (e.g., polystyrene), polysaccharides, nylon or nitrocellulose, 
teflon™, etc. Microtiter plates and arrays are especially convenient because a large number 
of assays can be carried out simultaneously, using small amounts of reagents and samples. 
The particular manner of binding of the composition is not crucial so long as it is compatible 
with the reagents and overall methods of the invention, maintains the activity of the 
composition and is nondiffusable. Preferred methods of binding include the use of antibodies 
(which do not sterically block either the ligand binding site or activation sequence when the 
protein is bound to the support), direct binding to "sticky" or ionic supports, chemical 
crosslinking, the synthesis of the protein or agent on the surface, etc. Following binding of 
the protein or agent, excess unbound material is removed by washing. The sample receiving 
areas may then be blocked through incubation with bovine serum albumin (BSA), casein or 
other innocuous protein or other moiety. 

In a preferred embodiment, the angiogenesis protein is bound to the support, 
and a test compound is added to the assay. Alternatively, the candidate agent is bound to the 
support and the angiogenesis protein is added. Novel binding agents include specific 
antibodies, non-natural binding agents identified in screens of chemical libraries, peptide 
analogs, etc. Of particular interest are screening assays for agents that have a low toxicity for 
human cells. A wide variety of assays may be used for this purpose, including labeled in 
vitro protein-protein binding assays, electrophoretic mobility shift assays, immunoassays for 
protein binding, functional assays (phosphorylation assays, etc.) and the like. 

The determination of the bir^ "ng of the test modulating compound to the 
angiogenesis protein may be done in a number of ways. In a preferred embodiment, the 
compound is labelled, and binding determined directly, e.g., by attaching all or a portion of 
the angiogenesis protein to a solid support, adding a labelled candidate agent (e.g., a 
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fluorescent label), washing off excess reagent, and determining whether the label is present 
on the solid support. Various blocking and washing steps may be utilized as appropriate. 

By "labeled" herein is meant that the compound is either directly or indirectly 
labeled with a label which provides a detectable signal, e.g. radioisotope, fluoresces, 
enzyme, antibodies, particles such as magnetic particles, chemiluminescers, or specific 
binding molecules, etc. Specific binding molecules include pairs, such as biotin and 
streptavidin, digoxin and antidigoxin, etc. For the specific binding members, the 
complementary member would normally be labeled with a molecule which provides for 
detection, in accordance with known procedures, as outlined above. The label can directly or 
10 indirectly provide a detectable signal. 

In some embodiments, only one of the components is labeled, e.g., the 
proteins (or proteinaceous candidate compounds) can be labeled. Alternatively, more than 
one component can be labeled with different labels, e.g., 125 I for the proteinsand a fluorophor 
for the compound. Proximity reagents, e.g., quenching or energy transfer reagents are also 

jp-'U 

AS useful. ' • 

M In one embodiment, the binding of the test compound is determined by 

5 2 1 

competitive binding assay. The competitor is a binding moiety known to bind to the target 
molecule (i.e. an angiogenesis protein), such as an antibody, peptide, binding partner, ligand, 
etc. Under certain circumstances, there may be competitive binding between the compound 
20 and the binding moiety, with the binding moiety displacing the compound. In one 

embodiment, the test compound is labeled. Either the compound, or the competitor, or both, 
is added first to the protein for a time sufficient to allow binding, if present. Incubations may 
be performed at a temperature which facilitates optimal activity, typically between 4 and 
40°C. Incubation periods are typically optimized, e.g., to facilitate rapid high throughput 
25 screening. Typically between 0. 1 and 1 hour will be sufficient. Excess reagent is generally 
removed or washed away. The second component is then added, and the presence or absence 
of the labeled component is followed, to indicate binding. 

In a preferred embodiment, the competitor is added first, followed by the test 
compound. Displacement of the competitor is an indication that the test compound is binding 
30 to the angiogenesis protein and thus is capable of binding to, and potentially modulating, the 
activity of the angiogenesis protein. In this embodiment, either component can be labeled. 
Thus, for example, if the competitor is labeled, the presence of label in the wash solution 
indicates displacement by the agent. Alternatively, if the test compound is labeled, the 
presence of the label on the support indicates displacement. 
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In an alternative embodiment, the test compound is added first, with 
incubation and washing, followed by the competitor. The absence of binding by the 
competitor may indicate that the test compound is bound to the angiogenesis protein with a 
higher affinity. Thus, if the test compound is labeled, the presence of the label on the 
5 support, coupled with a lack of competitor binding, may indicate that the test compound is 
capable of binding to the angiogenesis protein. 

In a preferred embodiment, the methods comprise differential screening to 
identity agents that are capable of modulating the activitity of the angiogenesis proteins. In 
this embodiment, the methods comprise combining an angiogenesis protein and a competitor 
35) in a first sample. A second sample comprises a test compound, an angiogenesis protein, and 
Q a competitor. The binding of the competitor is determined for both samples, and a change, or 
LI difference in binding between the two samples indicates the presence of an agent capable of 

.SOT;:. — 

J*? binding to the angiogenesis protein and potentially modulating its activity. That is, if the 

0 binding of the competitor is different in the second sample relative to the first sample, the 
Ll5 agent is capable of binding to the angiogenesis protein. 

jjjf . Alternatively, differential screening is used to identify drug candidates that 

ii 5 

01 bind to the native angiogenesis protein, but cannot bind to modified angiogenesis proteins. 

[I The structure of the angiogenesis protein may be modeled, and used in rational drug design to 
synthesize agents that interact with that site. Drug candidates that affect the activity of an 

20 angiogenesis protein are also identified by screening drugs for the ability to either enhance or 
reduce the activity of the protein. 

Positive controls and negative controls may be used in the assays. Preferably 
control and test samples are performed in at least triplicate to obtain statistically significant 
results. Incubation of all samples is for a time sufficient for the binding of the agent to the 

25 protein. Following incubation, samples are washed free of non-specifically bound material 
and the amount of bound, generally labeled agent determined. For example, where a 
radiolabel is employed, the samples may be counted in a scintillation counter to determine the 
amount of bound compound. 

A variety of other reagents may be included in the screening assays. These 

30 in<^ ude reagents like salts, neutral proteins, e.g. albumin, detergents, etc. which may be used & 
to facilitate optimal protein-protein binding and/or reduce non-specific or background 
interactions. Also reagents that otherwise improve the efficiency of the assay, such as 
protease inhibitors, nuclease inhibitors, anti-microbial agents, etc., may be used. The mixture 
of components may be added in an order that provides for the requisite binding. 
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In a preferred embodiment, the invention provides methods for screening for a 
compound capable of modulating the activity of an angiogenesis protein. The methods 
comprise adding a test compound, as defined above, to a cell comprising angiogenesis 
proteins. Preferred cell types include almost any cell. The cells contain a recombinant 

5 nucleic acid that encodes an angiogenesis protein. In a preferred embodiment, a library of 
candidate agents are tested on a plurality of cells. 

In one aspect, the assays are evaluated in the presence or absence or previous 
or subsequent exposure of physiological signals, for example hormones, antibodies, peptides, 
antigens, cytokines, growth factors, action potentials, pharmacological agents including 

10 chemotherapeutics, radiation, carcinogenics, or other cells (i.e. cell-cell contacts). In another 
example, the determinations are determined at different stages of the cell cycle process. 

In this way, compounds that modulate angiogenesis agents are identified. 
Compounds with pharmacological activity are able to enhance or interfere with the activity of 
the angiogenesis protein. Once identified, similar structures are evaluated to identify critical 

15 structural feature of the compound. • . 

In one embodiment, a method of inhibiting angiogenic cell division is 
provided. The method comprises administration of an angiogenesis inhibitor. In another 
embodiment, a method of inhibiting angiogenesis is provided. The method comprises 
administration of an angiogenesis inhibitor. In a further embodiment, methods of treating 

20 cells or individuals with angiogenesis are provided. The method comprises administration of 

an angiogenesis inhibitor. 

In one embodiment, an angiogenesis inhibitor is an antibody as discussed 
above. In another embodiment, the angiogenesis inhibitor is an antisense molecule. 

25 Polynucleotide modulators of an giogenesis 

Antisense Polynucleotides 

In certain embodiments, the activity of an angiogenesis-associated protein is 
downregulated, or entirely inhibited, by the use of antisense polynucleotide, i.e., a nucleic 
acid complementary to, and which can preferably hybridize specifically to, a coding mRNA 
30 nucleic acid sequence, e.g. , n-n angiogenesis protein mRNA, or a subsequence thereof. 

Binding of the antisense polynucleotide to the mRNA reduces the translation and/or stability 
of the mRNA. 

In the context of this invention, antisense polynucleotides can comprise 
naturally-occurring nucleotides, or synthetic species formed from naturally-occurring 
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subunits or their close homologs. Antisense polynucleotides may also have altered sugar 
moieties or inter-sugar linkages. Exemplary among these are the phosphorothioate and other 
sulfur containing species which are known for use in the art. Analogs are comprehended by 
this invention so long as they function effectively to hybridize with the angiogenesis protein 
mRNA. See, e.g., Isis Pharmaceuticals, Carlsbad, CA; Sequitor, Inc., Natick, MA. 

Such antisense polynucleotides can readily be synthesized using recombinant 
means, or can be synthesized in vitro. Equipment for such synthesis is sold by several 
vendors, including Applied Biosystems. The preparation of other oligonucleotides such as 
phosphorothioates and alkylated derivatives is also well known to those of skill in the art. 

Antisense molecules as used herein include antisense or sense 
oligonucleotides. Sense oligonucleotides can, e.g., be employed to block trancription by 
binding to the anti-sense strand. The antisense and sense oligonucleotide comprise a single- 
stranded nucleic acid sequence (either RNA or DNA) capable of binding to target mRNA 
(sense) or DNA (antisense) sequences for angiogenesis molecules. A preferred antisense 
molecule is for an angiogenesis sequences in Table 1, or for a ligand or activator thereof. 
Antisense or sense oligonucleotides, according to the present invention, comprise a fragment 
generally at least about 14 nucleotides, preferably from about 14 to 30 nucleotides. The 
ability to derive an antisense or a sense oligonucleotide, based upon a cDNA sequence 
encoding a given protein is described in, for example, Stein and Cohen (Cancer Res. 48:2659, 
1988) and van der Krol et al. (BioTechniques 6:958, 1988). 

Ribozymes 

In addition to antisense polynucleotides, ribozymes can be used to target and 
inhibit transcription of angiogenesis-associated nucleotide sequences. A ribozyme is an RNA 
molecule that catalytically cleaves other RNA molecules. Different kinds of ribozymes have 
been described, including group I ribozymes, hammerhead ribozymes, hairpin ribozymes, 
RNase P, and axhead ribozymes {see, e.g., Castanotto et al (1994) Adv. in Pharmacology 25: 
289-317 for a general review of the properties of different ribozymes). 

The general features of hairpin ribozymes are described, e.g., in Hampel et al. 
(1990) Nucl Acids Res. 18: 299-304; Hampel et al G.990) European Patent Publication No. 0 
360 257; U.S. Patent No. 5,254,678. Methods of preparing are well known to those of skill in 
the art {see, e.g. t Wong-Staal et al., WO 94/26877; Ojwang et al (1993) Proc. Natl Acad. 
Sci. USA 90: 6340-6344; Yamada et al (1994) Human Gene Therapy 1 : 39-45; Leavitt et al 
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(1995) Proc. Natl. Acad. Sci. USA 92: 699-703; Leavitt et al. (1994) Human Gene Therapy 5: 
1151-120; and Yamada et al. (1994) Virology 205: 121-126). 

Polynucleotide modulators of angiogenesis may be introduced into a cell 
containing the target nucleotide sequence by formation of a conjugate with a ligand binding 
molecule, as described in WO 91/04753. Suitable ligand binding molecules include, but are 
not limited to, cell surface receptors, growth factors, other cytokines, or other ligands that 
bind to cell surface receptors. Preferably, conjugation of the ligand binding molecule does 
not substantially interfere with the ability of the ligand binding molecule to bind to its 
corresponding molecule or receptor, or block entry of the sense or antisense oligonucleotide 
or its conjugated version into the cell. Alternatively, a polynucleotide modulator of 
angiogenesis may be introduced into a cell containing the target nucleic acid sequence, e.g., 
by formation of an polynucleotide-lipid complex, as described in WO 90/10448. It is 
understood that the use of antisense molecules or knock out and knock in models may also be 
used in screening assays as discussed above, in addition to methods of treatment. 

Thus, in one embodiment, methods of modulating angiogenesis in cells or 
organisms are provided. In one embodiment, the methods comprise administering to a cell an 
anti-angiogenesis antibody that reduces or eliminates the biological activity of an 
endogeneous angiogenesis protein. Alternatively, the methods comprise administering to a 
cell or organism a recombinant nucleic acid encoding an angiogenesis protein. This may be 
accomplished in any number of ways. In a preferred embodiment, for example when the 
angiogenesis sequence is down-regulated in angiogenesis, such state may be reversed by 
increasing the amount of angiogenesis gene product in the cell. This can be accomplished, 
e.g., by overexpressing the endogeneous angiogenesis gene or administering a gene encoding 
the angiogenesis sequence, using known gene-therapy techniques, for example. In a 
preferred embodiment, the gene therapy techniques include the incorporation of the 
exogenous gene using enhanced homologous recombination (EHR), for example as described 
in PCT/US93/03868, hereby incorporated by reference in its entireity. Alternatively, for 
example when the angiogenesis sequence is up-regulated in angiogenesis, the activity of the 
endogeneous angiogenesis gene is decreased, for example by the administration of a 
angiogenesis antisense nucleic acid. £ 

In one embodiment, the angiogenesis proteins of the present invention may be 
used to generate polyclonal and monoclonal antibodies to angiogenesis proteins. Similarly, 
the angiogenesis proteins can be coupled, using standard technology, to affinity 
chromatography columns. These columns may then be used to purify angiogenesis 



68 



antibodies useful for production, diagnostic, or therapeutic purposes. In a preferred 
embodiment, the antibodies are generated to epitopes unique to a angiogenesis protein; that 
is, the antibodies show little or no cross-reactivity to other proteins. The angiogenesis 
antibodies may be coupled to standard affinity chromatography columns and used to purify 
angiogenesis proteins. The antibodies may also be used as blocking polypeptides, as outlined 
above, since they will specifically bind to the angiogenesis protein. 

Methods of identifying variant angiogenesis-associated sequences 

Without being bound by theory, expression of various angiogenesis sequences 
J 0 is correlated with angiogenesis. Accordingly, disorders based on mutant or variant 

angiogenesis genes may be determined. In one embodiment, the invention provides methods 
for identifying cells containing variant angiogenesis genes, e.g., determining all or part of the 
sequence of at least one endogeneous angiogenesis genes in a cell. This may be 
accomplished using any number of sequencing techniques. In a preferred embodiment, the 
* 1 5 invention provides methods of identifying the angiogenesis genotype of an individual, e.g. , 
determining all or part of the sequence of at least one angiogenesis gene of the individual. 
This is generally done in at least one tissue of the individual, and may include the evaluation 
of a number of tissues or different samples of the same tissue. The method may include 
comparing the sequence of the sequenced angiogenesis gene to a known angiogenesis gene, 

20 i. e. , a wild-type gene. 

The sequence of all or part of the angiogenesis gene can then be compared to 
the sequence of a known angiogenesis gene to determine if any differences exist. This can be 
done using any number of known homology programs, such as Bestfit, etc. In a preferred 
embodiment, the presence of a a difference in the sequence between the angiogenesis gene of 
25 the patient and the known angiogenesis gene correlates with a disease state or a propensity 
for a disease state, as outlined herein. 

In a preferred embodiment, the angiogenesis genes are used as probes to 
determine the number of copies of the angiogenesis gene in the genome. 

In another preferred embodiment, the angiogenesis genes are used as probes to 
30 determine '\ 3 chromosomal localization of the angiogenesis genes. Information such as 

chromosomal localization finds use in providing a diagnosis or prognosis in particular when 
chromosomal abnormalities such as translocations, and the like are identified in the 
angiogenesis gene locus. 
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Administration of pharmaceutical and vaccine compositions 

In one embodiment, a therapeutically effective dose of an angiogenesis protein 
or modulator thereof, is administered to a patient. By "therapeutically effective dose" herein 
is meant a dose that produces effects for which it is administered. The exact dose will depend 
on the purpose of the treatment, and will be ascertainable by one skilled in the art using 
known techniques {e.g., Ansel et al., Pharmaceuitcal Dosage Forms and Drug Delivery, 
Lippincott, Williams & Wilkins Publishers, ISBN: 06833 05 727; Lieberman (1992) 
Pharmaceutical Dosage Forms (vols. 1-3), Dekker, ISBN 0824770846, 082476918X, 
0824712692, 0824716981; Lloyd (1999) The Art, Science and Technology of Pharmaceutical 
Compounding, Amer. Pharmacutical Assn, ISBN 0917330889; and Pickar (1999) Dosage 
Calculations, Delmar Pub, ISBN 0766805042). As is known in the art, adjustments for 
angiogenesis degradation, systemic versus localized delivery, and rate of new protease 
synthesis, as well as the age, body weight, general health, sex, diet, time of administration, 
drug interaction and the severity of the condition may be necessary, and will be ascertainable 
with routine experimentation by those skilled in the art. ' 

A "patient" for the purposes of the present invention includes both humans and 
other animals, particularly mammals. Thus the methods are applicable to both human 
therapy and veterinary applications. In the preferred embodiment the patient is a mammal, 
preferably a primate, and in the most preferred embodiment the patient is human. 

The administration of the angiogenesis proteins and modulators thereof of the 
present invention can be done in a variety of ways as discussed above, including, but not 
limited to, orally, subcutaneously, intravenously, intranasally, transdermally, 
intraperitoneally, intramuscularly, intrapulmonary, vaginally, rectally, or intraocularly. In 
some instances, for example, in the treatment of wounds and inflammation, the angiogenesis 
proteins and modulators may be directly applied as a solution or spray. 

The pharmaceutical compositions of the present invention comprise an 
angiogenesis protein in a form suitable for administration to a patient. In the preferred 
embodiment, the pharmaceutical compositions are in a water soluble form, such as being 
present as pharmaceutically acceptable salts, which is meant to include both acid and base 
additi^-i salts. "Pharmaceutically acceptable acid addition salt" refers to those salts that retain 
the biological effectiveness of the free bases and that are not biologically or otherwise 
undesirable, formed with inorganic acids such as hydrochloric acid, hydrobromic acid, 
sulfuric acid, nitric acid, phosphoric acid and the like, and organic acids such as acetic acid, 
propionic acid, glycolic acid, pyruvic acid, oxalic acid, maleic acid, malonic acid, succinic 
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acid, fumaric acid, tartaric acid, citric acid, benzoic acid, cinnamic acid, mandelic acid, 
methanesulfonic acid, ethanesulfonic acid, p-toluenesulfonic acid, salicylic acid and the like. 
"Pharmaceutically acceptable base addition salts" include those derived from inorganic bases 
such as sodium, potassium, lithium, ammonium, calcium, magnesium, iron, zinc, copper, 
manganese, aluminum salts and the like. Particularly preferred are the ammonium, 
potassium, sodium, calcium, and magnesium salts. Salts derived from pharmaceutically 
acceptable organic non-toxic bases include salts of primary, secondary, and tertiary amines, 
substituted amines including naturally occurring substituted amines, cyclic amines and basic 
ion exchange resins, such as isopropylamine, trimethylamine, diethylamine, triethylamine, 
tripropylamine, and ethanolamine. 

The pharmaceutical compositions may also include one or more of the 
following: carrier proteins such as serum albumin; buffers; fillers such as microcrystalline 
cellulose, lactose, corn and other starches; binding agents; sweeteners and other flavoring 
agents; coloring agents; and polyethylene glycol. 
1 5 The pharmaceutical compositions can be administered in. a variety of unit 

dosage forms depending upon the method of administration. For example, unit dosage forms 
suitable for oral administration include, but are not limited to, powder, tablets, pills, capsules 
and lozenges. It is recognized that angiogenesis protein modulators (e.g., antibodies, 
antisense constructs, ribozymes, small organic molecules, etc.) when administered orally, 
20 should be protected from digestion. This is typically accomplished either by complexing the 
molecule(s) with a composition to render it resistant to acidic and enzymatic hydrolysis, or by 
packaging the molecule(s) in an appropriately resistant carrier, such as a liposome or a 
protection barrier. Means of protecting agents from digestion are well known in the art. 

The compositions for administration will commonly comprise an angiogenesis 
25 protein modulator dissolved in a pharmaceutically acceptable carrier, preferably an aqueous 
carrier. A variety of aqueous carriers can be used, e.g., buffered saline and the like. These 
■ solutions are sterile and generally free of undesirable matter. These compositions may be 
sterilized by conventional, well known sterilization techniques. The compositions may 
contain pharmaceutically acceptable auxiliary substances as required to approximate 
30 physiological conditions such pH adjusting and buffering agents, toxicity adjusting agents 
and the like, for example, sodium acetate, sodium chloride, potassium chloride, calcium 
chloride, sodium lactate and the like. The concentration of active agent in these formulations 
can vary widely, and will be selected primarily based on fluid volumes, viscosities, body 
weight and the like in accordance with the particular mode of administration selected and the 
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patient's needs (e.g., Remington 's Pharmaceutical Science, 15th ed., Mack Publishing 
Company, Easton, Pennsylvania (1980) and Goodman and Gillman, The Pharmacologial 
Basis ofTherapeutics,(Haxdman, J.G, Limbird, L.E, Molinoff, P.B., Ruddon, R.W, and 
Gilman, A.G.,eds) TheMcGraw-Hill Companies, Inc.,1996). 
5 Thus, a typical pharmaceutical composition for intravenous administration 

would be about 0.1 to 10 mg per patient per day. Dosages from 0.1 up to about 100 mg per 
patient per day may be used, particularly when the drug is administered to a secluded site and 
not into the blood stream, such as into a body cavity or into a lumen of an organ. 
Substantially higher dosages are possible in topical administration. Actual methods for 
0 preparing parenterally administrable compositions will be known or apparent to those skilled 
in the art, e.g., Remington 's Pharmaceutical Science and Goodman and Gillman, The 
Pharmacologial Basis of Therapeutics, supra. 

The compositions containing modulators of angiogenesis proteins can be 
administered for therapeutic or prophylactic treatments. In therapeutic applications, 
L 5 compositions are administered to a patient suffering from a disease (e.g., a cancer) in an 
amount sufficient to cure or at least partially arrest the disease and its complications. An 
amount adequate to accomplish this is denned as a "therapeutically effective dose." Amounts 
effective for this use will depend upon the severity of the disease and the general state of the 
patient's health. Single or multiple administrations of the compositions may be administered 
20 depending on the dosage and frequency as required and tolerated by the patient. In any event, 
the composition should provide a sufficient quantity of the agents of this invention to 
effectively treat the patient. An amount of modulator that is capable of preventing or slowing 
the development of cancer in a mammal is referred to as a "prophylactically effective dose." 
The particular dose required for a prophylactic treatment will depend upon the medical 
25 condition and history of the mammal, the particular cancer being prevented, as well as other 
factors such as age, weight, gender, administration route, efficiency, etc. Such prophylactic 
treatments may be used, e.g., in a mammal who has previously had cancer to prevent a 
recurrence of the cancer, or in a mammal who is suspected of having a significant likelihood 
of developing cancer. 

30 It will be appreciated that the present angiogenesis protein-modulating 

compounds can be administered alone or in combination with additional angiogenesis 
modulating compounds or with other therapeutic agent, e.g., other anti-cancer agents or 
treatments. 
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In numerous embodiments, one or more nucleic acids, e.g., polynucleotides 
comprising nucleic acid sequences set forth in Table 1, such as antisense polynucleotides or 
ribozymes, will be introduced into cells, in vitro or in vivo. The present invention provides 
methods, reagents, vectors, and cells useful for expression of angiogenesis-associated 
5 polypeptides and nucleic acids using in vitro (cell-free), ex vivo or in vivo (cell or 
organism-based) recombinant expression systems. 

The particular procedure used to introduce the nucleic acids into a host cell for 
expression of a protein or nucleic acid is application specific. Many procedures for 
introducing foreign nucleotide sequences into host cells may be used. These include the use 
30 of calcium phosphate transfection, spheroplasts, electroporation, liposomes, microinj ection, 
5 plasma vectors, viral vectors and any of the other well known methods for introducing cloned 
LI genomic DNA, cDNA, synthetic DNA or other foreign genetic material into a host cell (see, 
J e.g., Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology 
5 volume 152 Academic Press, Inc., San Diego, CA (Berger), F.M. Ausubel et ah, eds., Current 
Ll5 Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & 
QJ Sons, Inc., (supplemented through 1999), and Sambrook et ah, Molecular Cloning - A 
m Laboratory Manual (2nd Ed.), Vol. 1 -3, Cold Spring Harbor Laboratory, Cold Spring 

y Harbor, New York, 1989. 

In a preferred embodiment, angiogenesis proteins and modulators are 
20 administered as therapeutic agents, and can be formulated as outlined above. Similarly, 

angiogenesis genes (including both the full-length sequence, partial sequences, or regulatory 
sequences of the angiogenesis coding regions) can be administered in a gene therapy 
application. These angiogenesis genes can include antisense applications, either as gene 
therapy (i.e. for incorporation into the genome) or as antisense compositions, as will be 

25 appreciated by those in the art. 

Angiogenesis polypeptides and polynucleotides can also be administered as 
vaccine compositions to stimulate HTL, CTL and antibody responses.. Such vaccine 
compositions can include, for example, lipidated peptides (e.g^Vitiello, A. et ah, J. Clin. 
Invest. 95:341, 1995), peptide compositions encapsulated in poly(DL-lactide-co-glycolide) 

30 ("PLG") microspheres (see, e.g., Eldridge, et ah, Ml lec. Immunol. 28:287-294, 1991 : Alonso 
et ah, Vaccine 12:299-306, 1994; Jones et ah, Vaccine 13:675-681, 1995), peptide 
compositions contained in immune stimulating complexes (ISCOMS) (see, e.g., Takahashi et 
ah, Nature 344:873-875, 1990; Hu et ah, Clin Exp Immunol. 113:235-243, 1998), multiple 
antigen peptide systems (MAPs) (see e.g., Tam, J. P., Proc. Natl. Acad. Sci. U.S.A. 85:5409- 
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5413, 1988; Tam, J.P., J. Immunol. Methods 196:17-32, 1996), peptides formulated as 
multivalent peptides; peptides for use in ballistic delivery systems, typically crystallized 
peptides, viral delivery vectors (Perkus, M. E. et al, In: Concepts in vaccine development, 
Kaufmann, S. H. E., ed., p. 379, 1996; Chakrabarti, S. et al, Nature 320:535, 1986; Hu, S. L. 
5 et al, Nature 320:537, 1986; Kieny, M.-P. et al, AIDS Bio/Technology 4:790, 1986; Top, F. 
H. etal,J. Infect. Dis. 124:148, 1971; Chanda, P. K. et al, Virology 175:535, 1990), 
particles of viral or synthetic origin {e.g., Kofler, N. et al, J. Immunol. Methods. 192:25, 
1996; Eldridge, J. H. et al, Sem. Hematol 30:16, 1993; Falo, L. D., Jr. et al, Nature Med. 
7:649, 1995), adjuvants (Warren, H. S., Vogel, F. R, and Chedid, L. A. Annu. Rev. Immunol. 
140 4:369, 1986; Gupta, R. K. et al, Vaccine 1 1:293, 1993), liposomes (Reddy, R. et al, J. 
n Immunol. 148:1585, 1992; Rock, K. L., Immunol Today 17:131, 1996), or, naked or particle 
f 11 absorbed cDNA (Ulmer, J. B. et al, Science 259: 1745, 1993; Robinson, H. L., Hunt, L. A., 
m and Webster, R. G., Vaccine 1 1 :957, 1 993; Shiver, J. W. et al. , In: Concepts in vaccine 
p development, Kaufmann, S. H. E., ed., p. 423, 1996; Cease, K. B., and Berzofsky, J. A, 
': 15 Annu. Rev. Immunol. 12:923, 1994 and Eldridge, J. H. et al, Sem. Hematol. 30:16, 1993). 
Pi Toxin-targeted delivery technologies, also known as receptor mediated targeting, such as 
M those of Avant Immunotherapeutics, Inc. (Needham, Massachusetts) may also be used, 
p Vaccine compositions often include adjuvants. Many adjuvants contain a 

substance designed to protect the antigen from rapid catabolism, such as aluminum hydroxide 
20 or mineral oil, and a stimulator of immune responses, such as lipid A, Bortadella pertussis or 
Mycobacterium tuberculosis derived proteins. Certain adjuvants are commercially available 
as, for example, Freund's Incomplete Adjuvant and Complete Adjuvant (Difco Laboratories, 
Detroit, MI); Merck Adjuvant 65 (Merck and Company, Inc., Rahway, NJ); AS-2 
(SmithKline Beecham, Philadelphia, PA); aluminum salts such as aluminum hydroxide gel 
25 (alum) or aluminum phosphate; salts of calcium, iron or zinc; an insoluble suspension of 
acylated tyrosine; acylated sugars; cationically or anionically derivatized polysaccharides; 
polyphosphazenes; biodegradable microspheres; monophosphoryl lipid A and quil A. 
Cytokines, such as GM-CSF, interleukin-2, -7, -12, and other like growth factors, may also be 
used as adjuvants. 

30 Vaccines can be administered as nucleic acid compositions w- erein DNA or 

RNA encoding one or more of the polypeptides, or a fragment thereof, is administered to a 
patient. This approach is described, for instance, in Wolff et. al, Science 247:1465 (1990) as 
well as U.S. Patent Nos. 5,580,859; 5,589,466; 5,804,566; 5,739,118; 5,736,524; 5,679,647; 
WO 98/04720; and in more detail below. Examples of DNA-based delivery technologies 
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include "naked DNA" facilitated (bupivicaine, polymers, peptide-mediated) delivery, 
cationic lipid complexes, and particle-mediated ("gene gun") or pressure-mediated delivery 
(see, e.g., U.S. Patent No. 5,922,687). 

For therapeutic or prophylactic immunization puiposes, the peptides of the 
invention can be expressed by viral or bacterial vectors. Examples of expression vectors 
include attenuated viral hosts, such as vaccinia or fowlpox. This approach involves the use of 
vaccinia virus, for example, as a vector to express nucleotide sequences that encode 
angiogenic polypeptides or polypeptide fragments. Upon introduction into a host, the 
recombinant vaccinia virus expresses the immunogenic peptide, and thereby elicits an 
immune response. Vaccinia vectors and methods useful in immunization protocols are 
described in, e.g., U.S. Patent No. 4,722,848. Another vector is BCG (Bacille Calmette 
Guerin). BCG vectors are described in Stover et al, Nature 351:456-460 (1991). A wide 
variety of other vectors useful for therapeutic administration or immunization e.g. adeno and 
adeno-associated virus vectors, retroviral vectors, Salmonella typhi vectors, detoxified 
anthrax toxin vectors, and the like, will be apparent to those skilled in the art from the 
description herein (see, e.g., Shata et ah (2000) Mol Med Today, 6: 66-71; Shedlock et al, J 
Leukoc Biol 68,:793-806, 2000; Hipp et al, In Vivo 14:571-85, 2000). 

Methods for the use of genes as DNA vaccines are well known, and include 
placing an angiogenesis gene or portion of an angiogenesis gene under the control of a 
regulatable promoter or a tissue-specific promoter for expression in an angiogenesis patient. 
The angiogenesis gene used for DNA vaccines can encode full-length angiogenesis proteins, 
but more preferably encodes portions of the angiogenesis proteins including peptides derived 
from the angiogenesis protein. In one embodiment, a patient is immunized with a DNA 
vaccine comprising a plurality of nucleotide sequences derived from an angiogenesis gene. 
For example, angiogenesis-associated genes or sequence encoding subfragments of an 
angiogenesis protein are introduced into expression vectors and tested for their 
immunogenicity in the context of Class I MHC and an ability to generate cytotoxic T cell 
responses. This procedure provides for production of cytotoxic T cell responses against cells 
which present antigen, including intracellular epitopes. 

In a preferred embodiment, the DNA vaccines include a gene encoding an 
adjuvant molecule with the DNA vaccine. Such adjuvant molecules include cytokines that 
increase the immunogenic response to the angiogenesis polypeptide encoded by the DNA 
vaccine. Additional or alternative adjuvants are available. 

75 



In another preferred embodiment angiogenesis genes find use in generating 
animal models of angiogenesis. When the angiogenesis gene identified is repressed or 
diminished in angiogenesic tissue, gene therapy technology, e.g., wherein antisense RNA 
directed to the angiogenesis gene will also diminish or repress expression of the gene. 
5 Animal models of angiogenesis find use in screening for modulators of an angiogenesis- 
associated sequence or modulators of angiogenesis. Similarly, transgenic animal technology 
including gene knockout technology, for example as a result of homologous recombination 
with an appropriate gene targeting vector, will result in the absence or increased expression 
of the angiogenesis protein. When desired, tissue-specific expression or knockout of the 
H 0 angiogenesis protein may be necessary. 

p It is also possible that the angiogenesis protein is overexpressed in 

!l angiogenesis. As such, transgenic animals can be generated that overexpress the 

|? angiogenesis protein. Depending on the desired expression level, promoters of various 

O strengths can be employed to express the transgene. Also, the number of copies of the 

Ll 5 integrated transgene can be determined and compared for a determination of the expression 

W level of the transgene. Animals generated by such methods find use as animal models of 

ffi angiogenesis and are additionally useful in screening for modulators to treat angiogenesis. 

is s. 

Kits for Use in Diagnostic and/or Prognostic Applications 
20 For use in diagnostic, research, and therapeutic applications suggested above, 

kits are also provided by the invention. In the diagnostic and research applications such kits 
may include any or all of the following: assay reagents, buffers, angiogenesis-specific nucleic 
acids or antibodies, hybridization probes and/or primers, antisense polynucleotides, 
ribozymes, dominant negative angiogenesis polypeptides or polynucleotides, small molecules 
25 inhibitors of angiogenesis-associated sequences etc. A therapeutic product may include 
sterile saline or another pharmaceutically acceptable emulsion and suspension base. 

In addition, the kits may include instructional materials containing directions 
(i.e., protocols) for the practice of the methods of this invention. While the instructional 
materials typically comprise written or printed materials they are not limited to such. Any 
30 medium capable of storing such in " ructions and communicating them to an end user is 

contemplated by this invention. Such media include, but are not limited to electronic storage 
media (e.g., magnetic discs, tapes, cartridges, chips), optical media {e.g., CD ROM), and the 
like. Such media may include addresses to internet sites that provide such instructional 
materials. 
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The present invention also provides for kits for screening for modulators of 
angiogenesis-associated sequences. Such kits can be prepared from readily available 
materials and reagents. For example, such kits can comprise one or more of the following 
materials: an angiogenesis-associated polypeptide or polynucleotide, reaction tubes, and 
instructions for testing angiogenic-associated activity. Optionally, the kit contains 
biologically active angiogenesis protein. A wide variety of kits and components can be 
prepared according to the present invention, depending upon the intended user of the kit and 
the particular needs of the user. Diagnosis would typically involve evaluation of a plurality 
of genes or products. The genes will be selected based on correlations with important 
parameters in disease which may be identified in historical or outcome data. 

It is understood that the examples described above in no way serve to limit 
the true scope of this invention, but rather are presented for illustrative purposes. All 
publications, sequences of accession numbers, and patent applications cited in this 
specification are herein incorporated by reference as if each individual publication or patent 
application were specifically and individually indicated to be incorporated by reference. 



EXAMPLES 



Example 1: Tissue Preparation. Labeling Chins, and Fingerprints 
Purify total RNAfrom tissue using TRIzol Reagent 

Homogenize tissue samples in 1ml of TRIzol per 50mg of tissue using a 
Polytron 3100 homogenizer. The generator/probe used depends upon the tissue size. A 
generator that is too large for the amount of tissue to be homogenized will cause a loss of 
sample and lower RNA yield. TRIzol is added directly to frozen tissue, which is then 
homogenize. Following homogenization, insoluble material is removed by centrifugation at 
7500 x g for 15 min in a Sorvall superspeed or 12,000 x g for 10 min. in an Eppendorf 
centrifuge at 4°C. The clear homogenate is transferred to a new tube for use. The samples 
may be frozen now at -60° to -70°C (and kept for at least one month). The homogenate is 
mixed with 0.2ml of chloroform per 1ml of TRIzol reagent m~-d in the original 
homogenization and incubated at room temp, for 2-3 minutes. The aqueous phase is then 
separated by centrifugation and transferred to a fresh tube and the RNA precipitated using 
isopropyl alcohol. The pellet is isolated by centrifugation, washed, air-dried, resuspended in 
an appropriate volume of DEPC H 2 0, and the absorbance measured. 
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Purification of poly A+ mRNA from total RNA is performed as follows. Heat 
an oligotex suspension to 37°C and mixing immediately before adding to RNA. The 
Elution Buffer is heated at 70°C. Warm up 2 x Binding Buffer at 65°C if there is precipitate 
in the buffer. Mix total RNA with DEPC-treated water, 2 x Binding Buffer, and Oligotex 
according to Table 2 on page 16 of the Oligotex Handbook. Incubate for 3 minutes at 65°C. 
Incubate for 10 minutes at room temperature. Centrifuge for 2 minutes at 14,000 to 18,000 g. 
Remove supernatant without disturbing Oligotex pellet. A little bit of solution can be left 
behind to reduce the loss of Oligotex. Gently resuspend in Wash Buffer OW2 and pipet onto 
spin column. Centrifuge the spin column at full speed for 1 minute. Transfer spin column to 
^0 a new collection tube and gently resuspend in Wash Buffer OW2 and centrifuge as describe 
herein. Transfer spin column to a new tube and elute with 20 to 100 ul of preheated (70oC) 
Elution Buffer. Gently resuspend Oligotex resin by pipetting up and down. Centrifuge as 
above. Repeat elution with fresh elution buffer or use first eluate to keep the elution volume 
j low. Read absorbance, using diluted Elution Buffer as the blank. Before proceeding with 
,15 cDNA synthesis, precipitate the mRNA as follows: add 0.4 vol. of 7.5 M NH40Ac + 2.5 vol. 

of cold 100% ethanol. Precipitate at -20oC 1 hour to overnight (or 20-30 min. at -70oC). 
! Centrifuge at 14,000-1 6,000 x g for 30 minutes at 4oC. Wash pellet with 0.5ml of 

80%ethanol (-20oC) then centrifuge at 14,000-16,000 x g for 5 minutes at room temperature. 
Repeat 80% ethanol wash. Air dry the ethanol from the pellet in the hood.. Suspend pellet in 
20 DEPC H 2 0 at lug/ul concentration. 

To further Clean up total RNA using Qiagen's RNeasy kit, add no more than 
lOOug to an RNeasy column. Adjust sample to a volume of lOOul with RNase-free water. 
Add 350ul Buffer RLT then 250ul ethanol (100%) to the sample. Mix by pipetting (do not 
centrifuge) then apply sample to an RNeasy mini spin column. Centrifuge for 15 sec at 
25 >1 0,000rpm. Transfer column to a new 2-ml collection tube. Add 500ul Buffer RPE and 
centrifuge for 15 sec at >10,000rpm. Discard flowthrough. Add 500ul Buffer RPE and 
centrifuge for 15 sec at >10,000rpm. Discard flowthrough then centrifuge for 2 min at 
maximum speed to dry column membrane. Transfer column to a new 1 .5-ml collection tube 
and apply 30-50ul of RNase-free water directly onto column membrane. Centrifuge 1 min at 
1" ) >10,000rpm. Repeat elution. and read absorbance. ~ 

cDNA synthesis using Gibco's "Superscript Choice System for cDNA Synthesis" kit 

First Strand cDNA synthesis is performed as follows. Use 5ug of total RNA 
or lug of polyA+ mRNA as starting material. For total RNA, use 2ul of Superscript RT. For 
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polyA+ mRNA, use lul of Superscript RT. Final volume of first strand synthesis mix is 
20ul. RNA must be in a volume no greater than lOul. Incubate RNA with lul of lOOpmol 
T7-T24 oligo for 10minat70C. On ice, add 7 ul of: 4ul5Xlst Strand Buffer, 2ul of 0.1M 
DTT, and 1 ul of lOmM dNTP mix. Incubate at 37C for 2 min then add Superscript RT. 

5 Incubate at 37C for 1 hour. 

For the second strand synthesis, place 1st strand reactions on ice and add: 9 lul 
DEPC H 2 0; 30ul 5X 2nd Strand Buffer; 3ul lOmM dNTP mix; lul lOU/ul E.coli DNA 
Ligase; 4ul lOU/ul E.coli DNA Polymerase; and lul 2U/ul RNase H. Mix and incubate 2 
hours at 16C. Add 2ul T4 DNA Polymerase. Incubate 5 min at 16C. Add lOul of 0.5M 
HO EDTA. A further clean-up of DNA is performed using phenol:chloroform:isoamyl Alcohol 
O (25:24:1) purification. 

OJ In vitro Transcription (IVT) and labeling with biotin is performed as follows: 

5 Pipet 1 .5ul of cDNA into a thin-wall PCR tube. Make NTP labeling mix by combining 2ul T7 
O lOxATP (75mM) (Ambion); 2ul T7 lOxGTP (75mM) (Ambion); 1 .5ul T7 lOxCTP (75mM) 
Ll5 (Ambion); 1.5ul T7 lOxUTP (75mM) (Ambion); 3.75ul lOmM Bio-1 1-UTP (Boehringer- 
Qf Mannheim/Roche or Enzo); 3.75ul lOmM Bio-16-CTP (Enzo); 2ul lOx T7 transcription 
S buffer (Ambion); and 2ul lOx T7 enzyme mix (Ambion). The final volume is 20ul. Incubate 
fcf 6 hours at 37°C in a PCR machine. The RNA can be furthered cleaned. 

Fragmentation is performed as follows. 15 ug of labeled RNA is usually 
20 fragmented. Try to minimize the fragmentation reaction volume; a 10 ul volume is 

recommended but 20 ul is all right. Do not go higher than 20 ul because the magnesium in 
the fragmentation buffer contributes to precipitation in the hybridization buffer. Fragment 
RNA by incubation at 94 C for 35 minutes in 1 x Fragmentation buffer (5 x Fragmentation 
buffer is 200 mM Tris-acetate, pH 8.1; 500 mM KOAc; 150 mM MgOAc). The labeled 
25 RNA transcript can be analyzed before and after fragmentation. Samples can be heated to 

65°C for 15 minutes and electrophoresed on 1% agarose/TBE gels to get an approximate idea 

of the transcript size range 

For hybridization, 200 ul (lOug cRNA) of a hybridization mix is put on the 
chip. If multiple hybridizations are to be done (such as cycling through a 5 chip set), then it 
30 is recommended that an initial hybridization mix of 300 ul or more be made. The 

hybridization mix is: fragment labeled RNA (50ng/ul final cone); 50 pM 948-b control 
oligo; 1.5 pM BioB; 5 pM BioC; 25 pM BioD; 100 pM CRE; O.lmg/ml herring sperm DNA; 
0.5mg/ml acetylated BSA; and 300 ul with lxMES hyb buffer. 
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Labeling is performed as follows: The hybridization reaction includes non- 
biotinylated IVT (purified by RNeasy columns); IVT antisense RNA 4 ug:ul; random 
Hexamers (1 ug/ul) 4 ul and water to 14 ul. The reaciton is incubated at 70°C, 10 min. 
Reverse transcriptionis performed in the following reaction: 5X First Strand (BRL) buffer, 6 
5 ul; 0.1 M DTT, 3 ul; 50X dNTP mix, 0.6 ul; H 2 0, 2.4 ul; Cy3 or Cy5 dUTP (ImM), 3 ul; SS 
RT II (BRL), 1 ul in a final volume of 16 ul. Add to hybridization reaction. Incubate 30 
min., 42°C. Add 1 ul SSII and incubate another hour. Put on ice. 50X dNTP mix (25mM of 
cold dATP, dCTP, and dGTP, lOmM of dTTP: 25 ul each of lOOmM dATP, dCTP, and 
u dGTP; 10 |il of 1 OOmM dTTP to 1 5 ul H20. dNTPs from Pharmacia) 
Jjl o RNA degradation is performed as follows. Add 86 ul H20, 1 .5 ul 1M NaOH/ 

fU 2mM EDTA and incubate at 65°C, 10 min.. For U-Con 30, 500 ul TE/sample spin at 7000g 

Is: it-Is 

til for 10 min, save flow through for purification. For Qiagen purification, suspend u-con 
O recovered material in 500^1 buffer PB and proceed using Qiagen protocol For DNAse 

digestion, add 1 ul of 1/100 dil of DNAse/30ul Rx and incubate at 37°Ofor 15 min. Incubate 
fill 5 at 5 min 95°C to denature the DNAse/ 

m For sample preparation, add Cot-1 DNA, 10 ul; 50X dNTPs, 1 ul; 20X SSC, 

2 2.3 ul; Na pyro phosphate, 7.5 ul; lOmg/ml Herring sperm DNA; lul of 1/10 dilution to 21.8 
final vol. Dry in speed vac. Resuspend in 15 ulH20. Add 0.38 ul 10% SDS. Heat95°C,2 
min and slow cool at room temp, for 20 min. Put on slide and hybridize overnight at 64°C. 
20 Washing after the hybridization: 3X SSC/0.03% SDS: 2 min., 37.5 mis 20X SSC+0.75mls 
10% SDS in 250mls H20; IX SSC: 5 min., 12.5 mis 20X SSC in 250mls H20; 0.2X SSC: 5 
min., 2.5 mis 20X SSC in 250mls H20. Dry slides and scan at appropiate PMT's and 
channels. 

25 Example 2. A model of angiogenesis is used to determine expression in angiogenesis 

In the model of angiogenesis used to determine expression of angiogenesis- 
associated sequences, human umbilical vein endothelial cells (HUVEC) were obtained, e.g., 
as passage 1 (pi) frozen cells from Cascade Biologies (Oregon) and grown in maintenance 
medium: Medium 199 (Life Technologies) supplemented with 20% pooled human serum, 

30 100 mg/ml heparin and 75 mg/ml endothelial cell growth supplements (Sigma) and 

gentamicin (Life Technologies). An in vitro cell system model was used in which 2xl0 5 
HUVECs were cultured in 0.5 ml 3 mgs/ml plasminogen-depleted fibrinogen (Calbiochem, 
San Diego, CA) that was polymerized by the addition of 1 unit of maintenance medium 
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supplemented with 100 ng/ml VEGF and HGF and 10 ng/ml TGF-a (R&D Systems, 
Minneapolis,MN) added (growth medium). The growth medium was replaced every 2 days. 
Samples for RNA were collected, e.g., at 0, 2, 6, 15, 24, 48, and 96 hours of culture. The 
fibrin clots were placed in Trizol (Life Technologies) and disrupted using a Tissuemizer. 
Thereafter standard procedures were used for extracting the RNA (e.g., Example 1). 

Angiogenesis associated sequences thus identified are shown in Table 1. As 
indicated, some of the Accession numbers include expression sequence tags (ESTs). Thus, in 
one embodiment herein, genes within an expression profile, also termed expression profile 
genes, include ESTs and are not necessarily full length. 
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Table 1 



AAA4 DNA sequence 
Gene name: CGI -100 protein 
5 Unigene number: Hs. 275253 

Probeset Accession #: AA089688 
Nucleic Acid Accession #: NM_016040 cluster 

Coding sequence: 142-831 (predicted start/ stop codons underlined) 

10 GTTCGCCGCC GCCGCGCCGG CCACCTGGAG TTTTTTCAGA CTCCAGATTT CCCTGTCAAC 60 

CACGAGGAGT CCAGAGAGGA AACGCGGAGC GGAGACAACA GTACCTGACG CCTCTTTCAG 120 

CCCGGGATCG CCCCAGCAGG GATGGGCGAC AAGATCTGGC TGCCCTTCCC CGTGCTCCTT 180 

CTGGCCGCTC TGCCTCCGGT GCTGCTGCCT GGGGCGGCCG GCTTCACACC TTCCCTCGAT 240 

AGCGACTTCA CCTTTACCCT TCCCGCCGGC CAGAAGGAGT GCTTCTACCA GCCCATGCCC 3 00 

15 CTGAAGGCCT CGCTGGAGAT CGAGTACCAA GTTTTAGATG GAGCAGGATT AGATATTGAT 360 

TTCCATCTTG CCTCTCCAGA AGGCAAAACC TTAGTTTTTG AACAAAGAAA ATCAGATGGA 420 

GTTCACACTG TAGAGACTGA AGTTGGTGAT TACATGTTCT GCTTTGACAA TACATT CAGC 480 

ACCATTTCTG AGAAGGTGAT TTTCTTTGAA TTAATCCTGG ATAATATGGG AGAACAGGCA 540 

CAAGAACAAG AAGATTGGAA GAAATATATT ACTGGCACAG ATATATTGGA TATGAAACTG 600 

n20 GAAGACATCC TGGAATC CAT CAACAGCATC AAGT CCAGAC TAAGCAAAAG TGGGCACATA 660 

CAAACTCTGC TTAGAGCATT TGAAGCTCGT GATCGAAACA TACAAGAAAG CAACTTTGAT 720 

AGAGTCAATT TCTGGTCTAT GGTTAATTTA GTGGTCATGG TGGTGGTGTC AGCCATTCAA 780 

I GTTTATATGC TGAAGAGTCT GTT TGAAGAT AAGAGGAAAA GTAGAACTTA AAACTCCAAA 840 

k CTAGAGTACG TAACATTGAA AAATGAGGCA TAAAAATGCA ATAAACTGTT ACAGTCAAGA 900 

125 CCATTAATGG TCTTCTCCAA AATATTTTGA GATATAAAAG TAGGAAACAG GTATAATTTT 960 

AATGTGAAAA TTAAGTCTTC ACTTTCTGTG CAAGTAATCC TGCTGATCCA GTTGTACTTA 1020 

Zl AGTGTGTAAC AGGAATATTT TGCAGAATAT AGGTTTAACT GAATGAAGCC ATATTAATAA 10 80 

U CTGCATTTTC CTAACTTTGA AAAATTTTGC AAATGTCTTA GGTGATTTAA ATAAATGAGT 1140 

3 ATTGGGCCTA AA ' , 

f I i 

m AAA 7 DNA secruence 

!z; Gene name: Endothelial differentiation, sphingolipid G-protein-coupled receptor, 1 

01 { EDG1 ) 

O 35 Unigene number: Hs. 154210 
jU Probeset Accession #: M31210 

Nucleic Acid Accession #: NM_001400 cluster 

Coding sequence: 251-1396 (predicted start/stop codons underlined) 

4 0 TCTAAAGGTC GGGGGCAGCA GCAAGATGCG AAGCGAGCCG TACAGATCCC GGGCTCTCCG 60 

AACGCAACTT CGCCCTGCTT GAGCGAGGCT GCGGTTTCCG AGGCCCTCTC CAGCCAAGGA 120 

AAAGCTACAC AAAAAGCCTG GATCACTCAT CGAACCACCC CTGAAGCCAG TGAAGGCTCT 180 

CTCGCCTCGC CCTCTAGCGT TCGTCTGGAG TAGCGCCACC CCGGCTTCCT GGGGACACAG 240 

GGTTGGCACC ATGGGGCCCA CCAGCGTCCC GCTGGTCAAG GCCCACCGCA GCTCGGTCTC 3 00 

45 TGACTACGTC AACTATGATA TCATCGTCCG GCATTACAAC TACACGGGAA AGCTGAATAT 3 60 

CAGCGCGGAC AAGGAGAACA GCATTAAACT GACCTCGGTG GTGTTCATTC TCATCTGCTG 420 

CTTTATCATC CTGGAGAACA TCTTTGTCTT GCTGACCATT TGGAAAACCA AGAAATTCCA 480 

CCGACCCATG TACTATTTTA TTGGCAATCT GGCCCTCTCA GACCTGTTGG CAGGAGTAGC 540 

CTACACAGCT AACCTGCTCT TGTCTGGGGC CACCACCTAC AAGCTCACTC CCGCCCAGTG 600 

50 GTTTCTGCGG GAAGGGAGTA TGTTTGTGGC CCTGTCAGCC TCCGTGTTCA GTCTCCTCGC 660 

CATCGCCATT GAGCGCTATA TCACAATGCT GAAAATGAAA CTCCACAACG GGAGCAATAA 720 

CTTCCGCCTC TTCCTGCTAA TCAGCGCCTG CTGGGTCATC TCCCTCATCC TGGGTGGCCT 780 

GCCTATCATG GGCTGGAACT GCAT CAGTGC GCTGTCCAGC TGCTCCACCG TGCTGCCGCT 84 0 

CTACCACAAG C ACT AT AT C C TCTTCTGCAC CACGGTCTTC ACTCTGCTTC TGCTCTCCAT 900 

55 CGTCATTCTG TACTGCAGAA TCTACTCCTT GGTCAGGACT CGGAGCCGCC GCCTGACGTT 960 

CCGCAAGAAC ATTTCCAAGG CCAGCCGCAG CTCTGAGAAT GTGGCGCTGC TCAAGACCGT 1020 

AATTATCGTC CTGAGCGTCT TCATCGCCTG CTGGGCACCG CTCTTCATCC TGCTCCTGCT 1080 

GGATGTGGGC TGCAAGGTGA AGACC TGTGA CATCCTCTTC AGAGCGGAGT ACTTCCTGGT 114 0 

GTTACCTGTG CTCAACTCCG GCACCAACCC CATCATTTAC ACTCTGACCA ACAAGGAGAT 1200 

60 GCGT' JGGCC TTCATCCGGA TCATGTCCTG CTGCAAGTGC CCGAGCGGAG ACTCTGCTGG 1260 

CAAATT CAAG CGACCCATCA TCGCCGGCAT GGAATTCAGC CGCAGCAAAT CGGACAATTC 1320 

CTCCCACCCC CAGAAAGACG AAGGGGACAA CCCAGAGACC ATTATGTCTT CTGGAAACGT 13 80 

CAACTCTTCT TCCTAGAACT GGAAGCTGTC CACCCACCGG AAGCGCTCTT TACTTGGTCG 14 4 0 

CTGGCCACCC CAGTGTTTGG AAAAAAATCT CTGGGCTTCG ACTGCTGCCA GGGAGGAGCT 1500 

65 GCTGCAAGCC AGAGGGAGGA AGGGGGAGAA TACGAACAGC CTGGTGGTGT CGGGTGTTGG 156 0 

TGGGTAGAGT TAGTTCCTGT GAACAATGCA CTGGGAAGGG T GG AG AT C AG GTCCCGGCCT 162 0 

GGAATATATA TTCTACCCCC CTGGAGCTTT GATTTTGCAC TGAGCCAAAG GTCTAGCATT 1680 

GTCAAGCTCC TAAAGGGTTC ATTTGGCCCC TCCTCAAAGA CTAATGTCCC CATGTGAAAG 1740 
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CGTCTCTTTG TCTGGAGCTT TGAGGAGATG TTTTCCTTCA CTTTAGTTTC AAACCCAAGT 18 00 

GAGTGTGTGC ACTTCTGCTT CTTTAGGGAT GCCCTGTACA TCCCACACCC CACCCTCCCT 1860 

TCCCTTCATA CCCCTCCTCA ACGTTCTTTT ACTTTATACT TTAACTACCT GAGAGTTATC 1920 

AGAGCTGGGG TTGTGGAATG ATCGATCATC TATAGCAAAT AGGCTATGTT GAGTACGTAG 1980 

GCTGTGGGAA GATGAAGATG GTTTGGAGGT GTAAAACAAT GTCCTTCGCT GAGGCCAAAG 204 0 

TTTCCATGTA AGCGGGATCC GTTTTTTGGA ATTTGGTTGA AGTCACTTTG ATTTCTTTAA 2100 

AAAACATCTT TTCAATGAAA TGTGTTACCA TTTCATATCC ATTGAAGCCG AAATCTGCAT 2160 

AAGGAAGCCC ACTTTATCTA AATGATATTA GCCAGGATCC TTGGTGTCCT AGGAGAAACA 2220 

GACAAGCAAA ACAAAGTGAA AACCGAATGG ATTAACTTTT GCAAACCAAG GGAGATTTCT 2280 

TAGCAAATGA GTCTAACAAA TATGACATCC GTCTTTCCCA CTTTTGTTGA TGTTTATTTC 2340 

AGAATCTTGT GTGATT CAT T TCAAGCAACA ACATGTTGTA TTTTGTTGTG TTAAAAGTAC 2400 

TTTTCTTGAT TTTTGAATGT ATTTGTTTCA GGAAGAAGTC ATTTTATGGA TTTTTCTAAC 2460 

CCGTGTTAAC TTTTCTAGAA TCCACCCTCT TGTGCCCTTA AGCATTACTT TAACTGGTAG 2520 

GGAACGCCAG AACTTTTAAG TCCAGCTATT CATTAGATAG TAATTGAAGA TATGTATAAA 2580 

TATTACAAAG AATAAAAATA TATTACTGTC TCTTTAGTAT GGTTTTCAGT G CAATT AAAC 2640 

CGAGAGATGT CTTGTTTTTT TAAAAAGAAT AGTATTTAAT AGGTTTCTGA CTTTTGTGGA 2700 
TCATTTTGCA CATAGCTTTA TCAACTTTTA AACATTAATA AACTGATTTT TTTAAAG 



AAB3 DNA sequence 

Gene name: Solute carrier family 20 (phosphate transporter), member 1, Human 

leukaemia virus receptor 1 (GLVR1) 

Unigene number: Hs.78452 

Probeset Accession #: L20859 

Nucleic Acid Accession #: NM_005415 cluster 

Coding sequence: predicted 371-2410 (predicted start/stop codons underlined) 

GAGCTGTCCC CGGTGCCGCC GACCCGGGCC GTGCCGTGTG CCCGTGGCTC CAGCCGCTGC 60 

CGCCTCGATC TCCTCGTCTC CCGCTCCGCC CTCCCTTTTC CCTGGATGAA CTTGCGTCCT § 120 

TTCTCTTCTC CGCCATGGAA TTCTGCTCCG TGCTTTTAGC CCTCCTGAGC CAAAGAAACC ' 180 

CCAGACAACA GATGC C CAT A CGCAGCGTAT AG CAGTAACT CCCCAGCTCG GTTTCTGTGC 240 

CGTAGTTTAC AGTATTTAAT TTTATATAAT ATATATTATT TATTATAGCA TTTTTGATAC 300 

CTCATATTCT GTTTACACAT CTTGAAAGGC GCTCAGTAGT TCTCTTACTA AACAACCACT 360 

ACTCCAGAGA ATGGCAACGC TGATTACCAG TACTACAGCT GCTACCGCCG CTTCTGGTCC 420 

TTTGGTGGAC TACCTATGGA TGCTCATCCT GGGCTTCATT ATTGCATTTG TCTTGGCATT 480 

CTCCGTGGGA GCCAATGATG TAGCAAATTC TTTTGGTACA GCTGTGGGCT CAGGTGTAGT 540 

GAC CCTGAAG CAAGCCTGCA TCCTAGCTAG CATCTTTGAA ACAGTGGGCT CTGTCTTACT 600 

GGGGGCCAAA GTGAGCGAAA CCATCCGGAA GGGCTTGATT GACGTGGAGA TGTACAACTC 660 

GACTCAAGGG CTACTGATGG CCGGCTCAGT CAGTGCTATG TTTGGTTCTG CTGTGTGGCA 720 

ACTCGTGGCT TCGTTTTTGA AGCTCCCTAT TTCTGGAACC CATTGTATTG TTGGTGCAAC 780 

TATTGGTTTC TCCCTCGTGG CAAAGGGGCA GGAGGGTGTC AAGTGGTCTG AACTGATAAA 840 

AATTGTGATG TCTTGGTTCG TGTCCCCACT GCTTTCTGGA ATTATGTCTG GAATTTTATT 900 

CTTCCTGGTT CGTGCATTCA TCCTCCATAA GGCAGATCCA GTTCCTAATG GTTTGCGAGC 960 

TTTGCCAGTT TTCTATGCCT GCACAGTTGG AATAAACCTC TTTTCCATCA TGTATACTGG 1020 

AGCACCGTTG CTGGGCTTTG ACAAACTTCC TCTGTGGGGT ACCATCCTCA TCTCGGTGGG 1080 

ATGTGCAGTT TTCTGTGCCC TTATCGTCTG GTTCTTTGTA TGTCCCAGGA TGAAGAGAAA 1140 

AATTGAACGA GAAATAAAGT GTAGTCCTTC TGAAAGCCCC TTAATGGAAA AAAAGAATAG 1200 

CTTGAAAGAA GAC CATGAAG AAACAAAGTT GTCTGTTGGT GATATTGAAA ACAAGCATCC 1260 

TGTTTCTGAG GTAGGGCCTG CCACTGTGCC CCTCCAGGCT GTGGTGGAGG AGAGAACAGT 1320 

CTCATTCAAA CTTGGAGATT TGGAGGAAGC TCCAGAGAGA GAGAGGCTTC CCAGCGTGGA 1380 

CTTGAAAGAG GAAAC CAGCA TAGATAGCAC CGTGAATGGT GCAGTGCAGT TGCCTAATGG 1440 

GAACCTTGTC CAGTTCAGTC AAGCCGTCAG CAACCAAATA AACTCCAGTG GCCACTCCCA 1500 

GT AT CACACC GTGCATAAGG ATTCCGGCCT GTACAAAGAG CTACTC CATA AATTACATCT 1560 

TGCCAAGGTG GGAGATTGCA TGGGAGACTC CGGTGACAAA CCCTTAAGGC GCAATAATAG 1620 

CTATACTTCC TATACCATGG CAATATGTGG CATGCCTCTG GATTCATTCC GTGCCAAAGA 1680 

AGGTGAACAG AAGGGCGAAG AAATGGAGAA GCTGACATGG CCTAATGCAG ACTCCAAGAA 174 0 

GCGAATTCGA ATGGACAGTT ACACCAGTTA CTGCAATGCT GTGTCTGACC TTCACTCAGC 1800 

ATCTGAGATA GACATGAGTG TCAAGGCAGC GATGGGTCTA GGTGACAGAA AAGGAAGTAA 1860 

TGGCTCTCTA GAAGAATGGT ATGAC^GGA TAAGCCTGAA GTCTCTCTCC TCTTCCAGTT 1920 

CCTGCAGATC CTTACAGCCT GCTTTO'JGTC ATTCGCCCAT GGTGGCAATG ACGTAAGCAA 1980 

TGCCATTGGG CCTCTGGTTG CTTTATATTT GGTTTATGAC ACAGGAGATG TTTCTTCAAA 204 0 

AGTGGCAACA CCAATATGGC TTCTACTCTA TGGTGGTGTT GGTATCTGTG TTGGTCTGTG 2100 

GGTTTGGGGA AGAAGAGTTA TCCAG AC CAT GGGGAAGGAT CTGAC AC CG A TCACACCCTC 2160 

TAGTGGCTTC AGTATTGAAC TGGCATCTGC CCTCACTGTG GTGATTGCAT CAAATATTGG 2220 

CCTTCCCATC AGTACAACAC ATTGTAAAGT GGGCTCTGTT GTGTCTGTTG GCTGGCTCCG 2280 

GTCCAAGAAG GCTGTTGACT GGCGTCTCTT TCGTAACATT TTTATGGCCT GGTTTGTCAC 2340 

AGTCCCCATT TCTGGAGTTA TCAGTGCTGC CATCATGGCA ATCTTCAGAT ATGTCATCCT 24 00 

CAGAATGTGA AGCTGTTTGA GATTAAAATT TGTGTCAATG TTTGGGACCA TCTTAGGTAT 246 0 
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TCCTGCTCCC CTGAAGAATG ATTACAGTGT TAACAGAAGA CTGACAAGAG TCTTTTTATT 2520 

TGGGAGCAGA GGAGGGAAGT GTTACTTGTG CTATAACTGC TTTTGTGCTA AATATGAATT 2580 

GTCTCAAAAT TAGCTGTGTA AAATAGCCCG GGTTCCACTG GCTCCTGCTG AGGTCCCCTT 2640 

TCCTTCTGGG CTGTGAATTC CTGTACATAT TTCTCTACTT TTTGTATCAG GCTTCAATTC 2700 

CATTATGTTT TAATGTTGTC TCTGAAGATG ACTTGTGATT TTTTTTTCTT TTTTTTAAAC 2760 

CATGAAGAGC CGTT TGACAG AGCATGCTCT GCGTTGTTGG TTTCACCAGC TTCTGCCCTC 2820 

ACATGCACAG GGATTTAACA ACAAAAATAT AACTACAACT TCCCTTGTAG TCTCTTATAT 2880 

AAGTAGAGTC CTTGGTACTC TGCCCTCCTG TCAGTAGTGG CAGGATCTAT TGGCATATTC 2940 

GGGAGCTTCT TAGAGGGATG AGGTTCTTTG AACACAGTGA AAATTTAAAT TAGTAACTTT 3000 

TTTGCAAGCA GTTTATTGAC TGTTATTGCT AAGAAGAAGT AAGAAAGAAA AAGCCTGTTG 3 060 

GCAATCTTGG TTATTTCTTT AAGATTTCTG GCAGTGTGGG ATGGATGAAT GAAGTGGAAT 3120 

GTGAACTTTG GGCAAGTTAA ATGGGACAGC CTTCCATGTT CATTTGTCTA CCTCTTAACT 3180 
GAATAAAAAA GCCTACAGTT TTTAGAAAAA ACCCGAATTC 



AAB4 DNA sequence 

Gene name: Matrix metalloproteinase 10 { stromelysin 2) 
Unigene number: Hs.2258 
Probeset Accession #: X07820 
Nucleic Acid Accession #: NM_002425 

Coding sequence: predicted 23-1453 (predicted start/stop codons underlined) 



AAAGAAGGTA AGGGCAGTGA G AATGA TGCA TCTTGCATTC CTTGTGCTGT TGTGTCTGCC 60 

AGTCTGCTCT GCCTATCCTC TGAGTGGGGC AGCAAAAGAG GAGGACTCCA ACAAGGATCT 120 

TGCCCAGCAA TACCTAGAAA AGTACTACAA CCTCGAAAAG GATGTGAAAC AGTTTAGAAG 180 

AAAGGACAGT AATCTCATTG TTAAAAAAAT CCAAGGAATG CAGAAGTTCC TTGGGTTGGA 240 

GGTGACAGGG AAGCTAGACA CTGACACTCT GGAGGTGATG CGCAAGCCCA GGTGTGGAGT 3 00 

TCCTGACGTT GGTCACTTCA GCTCCTTTCC TGGCATGCCG AAGTGGAGGA AAACCCACCT 360 

TACATACAGG ATTGTGAATT ATACACCAGA TTTGC CAAGA GATGCTGTTG ATTCTGCCAT 420 

TGAGAAAGCT CTGAAAGTCT GGGAAGAGGT GACTCCACTC ACATTCTCCA GGCTGTATGA ' 480 

AGGAGAGGCT GATATAATGA TCTCTTTCGC AGTTAAAGAA CATGGAGACT TTTACTCTTT 540 

TGATGGCCCA GGACACAGTT TGGCTCATGC CTACCCACCT GGACCTGGGC TTTATGGAGA 600 

TATTCACTTT GATGATGATG AAAAATGGAC AGAAGATGCA TCAGGCACCA ATTTATTCCT 660 

CGTTGCTGCT CATGAACTTG GCCACTCCCT GGGGCTCTTT CACTCAGCCA ACACTGAAGC 720 

TTTGATGTAC CCACTCTACA ACTCATTCAC AGAGCTCGCC CAGTTCCGCC TTTCGCAAGA 780 

TGATGTGAAT GGCATTCAGT CTCTCTACGG ACCTCCCCCT GCCTCTACTG AGGAACCCCT 840 

GGTGCCCACA AAATCTGTTC CTTCGGGATC TGAGATGCCA GCCAAGTGTG ATCCTGCTTT 900 

GTCCTTCGAT GCCATCAGCA CTCTGAGGGG AGAATATCTG TTCTTTAAAG ACAGATATTT 960 

TTGGCGAAGA TCCCACTGGA ACCCTGAACC TGAATTTCAT TTGATTTCTG CATTTTGGCC 102 0 

CTCTCTTCCA TCATATTTGG ATGCTGCATA TGAAGTTAAC AGCAGGGACA CCGTTTTTAT 1080 

TTTTAAAGGA AATGAGTTCT GGGCCATCAG AGGAAATGAG GTACAAGCAG GTTATCCAAG 1140 

AGGCATCCAT ACCCTGGGTT TTCCTCCAAC CATAAGGAAA ATTGATGCAG CTGTTTCTGA 1200 

CAAGGAAAAG AAGAAAACAT ACTTCTTTGC AGCGGACAAA TACTGGAGAT TTGATGAAAA 1260 

TAGCCAGTCC ATGGAGCAAG GCTTCCCTAG ACTAATAGCT GATGACTTTC CAGGAGTTGA 1320 

GCCTAAGGTT GATGCTGTAT TACAGGCATT TGGATTTTTC TACTTCTTCA GTGGATCATC 1380 

ACAGTTTGAG TTTGACCCCA ATGCCAGGAT GGTGACACAC ATATTAAAGA GTAACAGCTG 1440 

GTTACATTGC TAGGCGAGAT AGGGGGAAGA CAGATATGGG TGTTTTTAAT AAATC TAAT A 1500 

AT TATTCATC TAATGTATTA TGAGCCAAAA TGGTTAATTT TTCCTGCATG TTCTGTGACT 1560 

GAAGAAGATG AGCCTTGCAG ATATCTGCAT GTGTCATGAA GAATGTTTCT GGAATTCTTC 1620 

ACTTGCTTTT GAATTGCACT GAACAGAATT AAGAAAT AC T CATGTGCAAT AGGTGAGAGA 1680 

ATGTATTTTC ATAGATGTGT TATTACTTCC TCAATAAAAA GTTTTATTTT GGGCCTGTTC 1740 
CTT 



AAB6 DNA sequence 
Gene name: Podocalyxin-like 
Unigene number: Hs. 16426 
Probeset Accession #: U97519 

Nucleic Acid Accession #: NM_005397 cluster 

Coding sequence: 251-1837 (predicted start/stop Xodons underlined) 

AAACGCCGCC CAGGACGCAG CCGCCGCCGC CGCCGCTCCT CTGCCACTGG CTCTGCGCCC 60 

CAGCCCGGCT CTGCTGCAGC GGCAGGGAGG AAGAGCCGCC GCAGCGCGAC TCGGGAGCCC 120 

CGGGCCACAG CCTGGCCTCC GGAGCCACCC ACAGGCCTCC CCGGGCGGCG CCCACGCTCC 180 

TACCGCCCGG ACGCGCGGAT CCTCCGCCGG CACCGCAGCC ACCTGCTCCC GGCCCAGAGG 240 

CGACGACACG ATGCGCTGCG CGCTGGCGCT CTCGGCGCTG CTGCTACTGT TGTCAACGCC 300 

GCCGCTGCTG CCGTCGTCGC CGTCGCCGTC GCCGTCGCCG TCGCCCTCCC AGAATGCAAC 360 

CCAGACTACT ACGGACTCAT CTAACAAAAC AGC AC CGACT CCAGCATCCA GTGTCACCAT 420 
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CATGGCTACA GATACAGCCC AGCAGAGCAC AGTCCCCACT TCCAAGGCCA ACGAAATCTT 480 

GGCCTCGGTC AAGGCGACCA CCCTTGGTGT ATCCAGTGAC TCACCGGGGA CTACAACCCT 540 

GGCTCAGCAA GTCTCAGGCC CAGTCAACAC TACCGTGGCT AGAGGAGGCG GCTCAGGCAA 600 

CCCTACTACC ACCATCGAGA GCCCCAAGAG CACAAAAAGT GCAGACACCA CTACAGTTGC 660 

AACCTCCACA GCCACAGCTA AACCTAACAC CACAAGCAGC CAGAATGGAG CAGAAGATAC 720 

AACAAACTCT GGGGGGAAAA GCAGCCACAG TGTGAC CACA GACCTCACAT CCACTAAGGC 780 

AGAACATCTG ACGACCCCTC ACCCTACAAG TCCACTTAGC CCCCGACAAC CCACTTTGAC 840 

GCATCCTGTG GCCACCCCAA CAAGCTCGGG ACATGACCAT CTTATGAAAA TTTCAAGCAG 90 0 

TTCAAGCACT GTGGCTATCC CTGGCTACAC CTTCACAAGC CCGGGGATGA CCACCACCCT 960 

ACCGTCATCG GTTATCTCGC AAAGAACTCA ACAGACCTCC AGTCAGATGC CAGCCAGCTC 1020 

TACGGCCCCT TCCTCCCAGG AGACAGTGCA GCCCACGAGC CCGGCAACGG CATTGAGAAC 1080 

ACCTACCCTG CCAGAGACCA TGAGCTCCAG CCCCACAGCA GCATCAACTA CCCACCGATA 1140 

CCCCAAAACA CCTTCTCCCA CTGTGGCTCA TGAGAGTAAC TGGGCAAAGT GTGAGGATCT 12 00 

TGAGACACAG ACACAGAGTG AGAAGCAGCT CGTCCTGAAC CTCACAGGAA ACACCCTCTG 1260 

TGCAGGGGGC GCTTCGGATG AGAAATTGAT CTCACTGATA TGCCGAGCAG TCAAAGCCAC 1320 

CTTCAACCCG GCCCAAGATA AGTGCGG CAT ACGGCTGGCA TCTGTTCCAG GAAGTCAGAC 13 80 

CGTGGTCGTC AAAGAAATCA CT ATT C AC AC TAAGCTCCCT GCCAAGGATG TGTACGAGCG 144 0 

GCTGAAGGAC AAATGGGATG AACTAAAGGA GGCAGGGGTC AGTGACATGA AGCTAGGGGA 1500 

CCAGGGGCCA CCGGAGGAGG CCGAGGACCG CTTCAGCATG CCCCTCATCA TCACCATCGT 1560 

CTGCATGGCG TCATTCCTGC TCCTCGTGGC GGCCCTCTAT GGCTGCTGCC ACCAGCGCCT 1620 

CTCCCAGAGG AAGGACCAGC AGCGGCTAAC AGAGGAGCTG CAGACAGTGG AGAATGGTTA 1680 

CCATGACAAC CCAACACTGG AAGTGATGGA GACCTCTTCT GAGATGCAGG AGAAGAAGGT 1740 

GGTCAGCCTC AACGGGGAGC TGGGGGACAG CTGGATCGTC CCTCTGGACA ACCTGACCAA 1800 

GGACGACCTG GATGAGGAGG AAGACACACA CCTCTAGTCC GGTCTGCCGG TGGCCTCCAG 1860 

CAGCACCACA GAGCTCCAGA CCAACCACCC CAAGTGCCGT TTGGATGGGG AAGGGAAAGA 1920 

CTGGGGAGGG AGAGTGAACT CCGAGGGGTG TCCCCTCCCA ATCCCCCCAG GGCCTTAATT 1980 

TTTCCCTTTT CAACCTGAAC AAATCACATT CTGTCCAGAT TCCTCTTGTA AAATAACCCA 2040 

CTAGTGCCTG AGCTCAGTGC TGCTGGATGA TGAGGGAGAT CAAGAAAAAG CCACGTAAGG 2100 
GACTTTATAG ATGAACTAGT GGAATCCCTT CATTCTGCAG TGAGATTGCC GAGACCTGAA '2160 

GAGGGTAAGT GACTTGCCCA AGGTCAGAGC CACTTGGTGA CAGAGCCAGG ATGAGAACAA '2220 

AGATTCCATT TGCACCATGC CACACTGCTG TGTTCACATG TGCCTTCCGT CCAGAGCAGT 2280 

CCCGGGCAGG GGTGAAACTC CAGCAGGTGG CTGGGCTGGA AAGGAGGGCA GGGCTACATC 2340 

CTGGCTCGGT GGGATCTGAC GACCTGAAAG TCCAGCTCCC AAGTTTTCCT TCTCCTACCC 2400 

CAGCCTCGTG TACCCATCTT CCCACCCTCT ATGTTCTTAC CCCTCCCTAC ACTCAGTGTT 2460 

TGTT CCCACT TACTCTGTCC TGGGGCCTCT GGGATTAGCA CAGGTTATTC ATAACCTTGA 2520 

ACCCCTTGTT CTGGATTCGG ATTTTCTCAC ATTTGCTTCG TGAGATGGGG GCTTAACCCA 2580 

CACAGGT CTC CGTGCGTGAA CCAGGTCTGC TTAGGGGACC TGCGTGCAGG TGAGGAGAGA 264 0 

AGGGGACACT CGAGTCCAGG CTGGTATCTC AGGGCAGCTG ATGAGGGGTC AGCAGGAACA 2700 

CTGGCCCATT GCCCCTGGCA CTCCTTGCAG AGGCCACCCA CGATCTTCTT TGGGCTTCCA 2760 

TTTCCACCAG GGACTAAAAT CTGCTGTAGC TAGTGAGAGC AGCGTGTTCC TTTTGTTGTT 2820 

CACTGCTCAG CTGATGGGAG TGATTCCCTG AGACCCAGTA TGAAAGAGCA GTGGCTGCAG 2880 

GAGAGGCCTT CCCGGGGCCC CCCATCAGCG ATGTGTCTTC AGAGACAATC CATTAAAGCA 294 0 

GCCAGGAAGG ACAGGCTTTC CCCTGTATAT CATAGGAAAC TCAGGGACAT TTCAAGTTGC 3000 

TGAGAGTTTT GTTATAGTTG TTTTCTAACC CAGCCCTCCA CTGCCAAAGG CCAAAAGCTC 3060 

AGACAGTTGG CAGACGTCCA GTTAGCTCAT CTCACTCACT CTGATTCTCC TGTGCCACAG 3120 

GAAAAGAGGG CCTGGAAAGC GCAGTGCATG CTGGGTGCAT GAAGGGCAGC CTGGGGGACA 318 0 

GACTGTTGTG GGAACGTCCC ACTGTCCTGG CCTGGAGCTA GGCCTTGCTG TTCCTCTTCT 3240 

CTGTGAGCCT AGTGGGGCTG CTGCGGTTCT CTTGCAGTTT CTGGTGGCAT CTCAGGGGAA 33 00 

CACAAAAGCT ATGTCTATTC CCCAATATAG GACTTTTATG GGCTCGGCAG TTAGCTGCCA 33 6 0 

TGTAGAAGGC TCCTAAGCAG TGGGCATGGT GAGGTTTCAT CTGATTGAGA AGGGGGAATC 3420 

CTGTGTGGAA TGTTGAACTT TCGCCATGGT CTCCATCGTT CTGGGCGTAA ATTCCCTGGG 3480 

ATCAAGTAGG AAAATGGGCA GAACTGCTTA GGGGAATGAA ATTGCCATTT TTCGGGTGAA 354 0 

ACGCCACACC TCCAGGGTCT TAAGAGTCAG GCTCCGGCTG TAGTAGCTCT GATGAAATAG 36 00 

GCTATCCACT CGGGATGGCT TACTTTTTAA AAGGGTAGGG GGAGGGGCTG GGGAAGATCT 3660 

GTCCTGCACC ATCTGCCTAA TTCCTTCCTC ACAGTCTGTA GCCAT CTGAT ATCCTAGGGG 3720 

GAAAAGGAAG GCCAGGGGTT CACATAGGGC CCCAGCGAGT TTCCCAGGAG TTAGAGGGAT 3780 

GCGAGGCTAA CAAGTTCCAA AAACATCTGC CCCGATGCTC TAGTGTTTGG AGGTGGGCAG 3 84 0 

GATGGAGAAC AGTGCCTGTT TGGGGGAAAA CAGGAAATCT TGTTAGGCTT GAGTGAGGTG 3 900 

TTTGCTTCCT TCTTGCCCAG CGCTGGGTTC TCTCCACCCA GTAGGT TTTC TGTTGTGGTC 3960 

CCGTGGGAGA GGCCAGACTG GATTATTCCT CCTTTGCTGA TCCTGGGTCA CACTTCACCA 4020 

GCCAGGGCTT TTGACGGAGA CAGCAAATAG GCCTCTGCAA AT C AAT C AAA GGCTGCAACC 408 0 

CTATGGCCTC TTGGAGACAG ATGATGACTG GCAAGGACTA GAGAGCAGGA GTGCCTGGCC 414 0 

AGGTCGGTCC TGACTCTCCT GACTCTCCAT CGCTCTGTCC AAGGAGAACC CGGAGAGGCT 4 200 

CTGGGCTGAT T C AG AGGTTA CTGCTTTATA TTCGTCCAAA CTGTGTTAGT CTAGGCTTAG 4260 

GACAGCTTCA GAATCTGACA CCTTGCCTTG CTCTTGCCAC CAGGACACCT ATGTCAACAG 4320 

GCCAAACAGC CATGCATCTA TAAAGGTCAT CATCTTCTGC CACCTTTACT GGGTTCTAAA 43 80 

TGCTCTCTGA TAATTCAGAG AGCAT TGGGT CTGGGAAGAG GTAAGAGGAA CACTAGAAGC 4440 

TCAGCATGAC TTAAACAGGT TGTAGCAAAG ACAGTTTATC ATCAACTCTT TCAGTGGTAA 4 50 0 
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ACTGTGGTTT CCCCAAGCTG CACAGGAGGC CAGAAACCAC AAGTATGATG ACTAGGAAGC 4560 

CTACTGTCAT GAGAGTGGGG AGACAGGCAG CAAAGCTTAT GAAGGAGGTA CAGAATATTC 4620 

TTTGCGTTGT AAGACAGAAT ACGGGTTTAA TCTAGTCTAG GCRCCAGATT TTTTTCCCGC 4680 

TTGATAAGGA AAGCTAGCAG AAAGTTTATT TAAACCACTT CTTGAGCTTT ATCTTTTTTG 4740 

ACAATATACT GGAGAAACTT TGAAGAACAA GTTCAAACTG ATACATATAC ACATATTTTT 4 800 

TTGATAATGT AAATACAGTG AC CATGTT AA CCTACCCTGC ACTGCTTTAA GTGAACATAC 4860 

TTTGAAAAAG CATTATGTTA GCTGAGTGAT GGCCAAGTTT TTTCTCTGGA CAGGAATGTA 4 920 

AATGTCTTAC TGGAAATGAC AAGTTTTTGC TTGATTTTTT TTTTTAAACA AAAAATGAAA 4 980 

TATAACAAGA CAAACTTATG ATAAAGTATT TGTCTTGTAG ATCAGGTGTT TTGTTTTGTT 5040 

TTTTTAATTT TAAAATGCAA CCCTGCCCCC TCCCCAGCAA AGTCACAGCT CCATTTCAGT 5100 

AAAGGTTGGA GTCAATATGC TCTGGTTGGC AGGCAACCCT GTAGTCATGG AGAAAGGTAT 5160 

TTCAAGATCT AGTCCAATCT TTTTCTAGAG AAAAAGATAA TCTGAAGCTC ACAAAGATGA 5220 

AGTGACTTCC TCAAAAT C AC ATGGTTCAGG ACAGAAACAA GATTAAAACC TGGATCCACA 5280 

GACTGTGCGC CTCAGAAGGA ATAATCGGTA AATTAAGAAT TGCTACTCGA AGGTGCCAGA 5340 

ATGACACAAA GGACAGAATT CCTTTCCCAG TTGTTACCCT AGCAAGGCTA GGGAGGGCAT 5400 

GAACACAAAC ATAAGAACTG GTCTTCTCAC ACTTTCTCTG AATCATTTAG GTTTAAGATG 5460 

TAAGTGAACA ATTCTTTCTT TCTGCCAAGA AACAAAGTTT TGGATGAGCT TTTATATATG 5520 

G AACTT ACT C CAACAGGACT GAGGGACCAA GGAAACATGA TGGGGGAGGC AAGAGAGGGC 5580 

AAAGAGTAAA ACTGTAGCAT AGCTTTTGTC ACGGTCACTA GCTGATCCCT CAGGTCTGCT 5640 

GCAAACACAG CATGGAGGAC ACAGATGACT CTTTGGTGTT GGT CTTTTTG TCTGCAGTGA 5700 

ATGTTCAACA GTTTGCCCAG GAACTGGGGG ATCATATATG TCTTAGTGGA CAGGGGTCTG 5760 

AAGTACACTG GAATTTACTG AGAAACTTGT TTGTAAAAAC TATAGTTAAT AATTATTGCA 5820 
TTTTCTTACA AAAATATATT TTGGAAAATT GTATACTGTC AATTAAAGT 



AAB8 DNA sequence 

Gene name: EGF-containing fibulin-like extracellular matrix protein 1 
Unigene number: Hs. 76224 

Probeset Accession #: U03 877 ' 

Nucleic Acid Accession #: NM_004105 Transcript variant 1 

Coding sequence: 150-1631 (predicted start/stop codons underlined) 

CTAGTATTCT ACTAGAACTG GAAGATTGCT CTCCGAGTTT TTTTTTTGTT ATTTTGTTAA 60 

AAAATAAAAA GCTTGAGCAG CAATTCATAT TACTGTCACA GGTATTTTTG CTGTGCTGTG 120 

CAAGGTAACT CTGCTAGCTA AGATTCACAA TGTTGAAAGC CCTTTTCCTA ACTATGCTGA 180 

CTCTGGCGCT GGTCAAGTCA CAGGACACCG AAGAAACCAT CACGTACACG CAATGCACTG 240 

ACGGATATGA GTGGGATCCT GTGAGACAGC AATGCAAAGA TATTGATGAA TGTGACATTG 300 

TCCCAGACGC TTGTAAAGGT GGAATGAAGT GTGTCAACCA CTATGGAGGA TACCTCTGCC 360 

TTCCGAAAAC AGCCCAGATT ATTGTCAATA ATGAACAGCC TCAGCAGGAA ACACAACCAG 420 

CAGAAGGAAC CTCAGGGGCA ACCACCGGGG TTGTAGCTGC CAGCAGCATG GCAACCAGTG 4 80 

GAGTGTTGCC CGGGGGTGGT TTTGTGGCCA GTGCTGCTGC AGTCGCAGGC CCTGAAATGC 540 

AGACTGGCCG AAATAACTTT GTCATCCGGC GGAACCCAGC TGACCCTCAG CGCATTCCCT 600 

CCAACCCTTC CCACCGTATC CAGTGTGCAG CAGGCTACGA GCAAAGTGAA CACAACGTGT 660 

GCCAAGACAT AGACGAGTGC ACTGCAGGGA CGCACAACTG TAGAGCAGAC CAAGTGTGCA 720 

TCAATTTACG GGGATCCTTT GCATGTCAGT GCCCTCCTGG AT AT CAG AAG CGAGGGGAGC 780 

AGTGCGTAGA CATAGATGAA TGTACCATCC CTCCATATTG CCACCAAAGA TGCGTGAATA 840 

CACCAGGCTC ATTTTATTGC CAGTGCAGTC CTGGGTTTCA ATTGGCAGCA AACAACTATA 900 

CCTGCGTAGA TATAAATGAA TGTGATGC C A GCAAT CAATG TGCTCAGCAG TGCTACAACA 960 

TTCTTGGTTC ATTCATCTGT CAGTGCAATC AAGGATATGA GCTAAGCAGT GACAGGCTCA 1020 

ACTGTGAAGA CATTGATGAA TGCAGAACCT CAAGCTACCT GTGT CAATAT CAATGTGTCA 108 0 

ATGAACCTGG GAAATTCTCA TGTATGTGCC CCCAGGGATA CCAAGTGGTG AGAAGTAGAA 1140 

CATGTCAAGA TATAAATGAG TGTGAGAC C A CAAATGAATG CCGGGAGGAT GAAATGTGTT 1200 

GGAATTATCA TGGCGGCTTC CGTTGTTATC CACGAAATCC TTGTCAAGAT CCCTACATTC 1260 

T AAC AC CAG A GAACCGATGT GTTTGCCCAG TCTCAAATGC CATGTGCCGA GAACTGCCCC 132 0 

AGTCAATAGT CTACAAATAC ATGAGCATCC GATCTGATAG GTCTGTGCCA TCAGACATCT 1380 

TCCAGATACA GGCCACAACT ATTTATGCCA AC AC CAT C AA TACTTTTCGG ATTAAATCTG 144 0 

GAAATGAAAA TGGAGAGTTC TACCTACGAC AAACAAGTCC TGTAAGTGCA ATGCTTGTGC 1500 

TCGTGAAGTC ATTATCAGGA CCAAGAGAAC AT AT CGTGGA CCTGGAGATG CTGACAGTCA 1560 

GCAGTATAGG GACCTTCCGC ACAAGCTCTG TGTTAAGATT GACAATAATA GTGGGGCCAT 1620 

TTTCATTTTA_G "CTTTTCTA AGAGTCAACC ACAGGCATTT AAGTCAGCCA AAGAATATTG 1680 

TT AC CT T AAA GCACTATTTT ATTTATAGAT AT AT CT AGTG CATCTACATC TCTATACTGT 174 0 

ACACTCACCC ATAACAAACA ATTACACCAT GGTATAAAGT GGGCATTTAA TATGTAAAGA 1800 

TTCAAAGTTT GTCTTTATTA CTATATGTAA ATTAGACATT AATCCACTAA ACTGGTCTTC 1860 

TTCAAGAGAG CTAAGTATAC ACTATCTGGT GAAACTTGGA TTCTTTCCTA TAAAAGTGGG 192 0 

ACCAAGCAAT GATGATCTTC TGTGGTGCTT AAGGAAACTT ACT AGAGCT C CACTAACAGT 1980 

CTCATAAGGA GGCAGCCATC ATAACCATTG AATAGCATGC AAGGGTAAGA ATGAGTTTTT 2040 

AACTGCTTTG TAAGAAAATG G AAAAGGT C A ATAAAGATAT ATTTCTTTAG AAAATGGGGA 2100 

TCTGCCATAT TTGTGTTGGT TTTTATTTTC ATATCCAGCC TAAAGGTGGT TGTTTATTAT 2160 
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ATAGTAATAA ATCATTGCTG TACAACATGC TGGTTTCTGT AGGGTATTTT TAATTTTGTC 2220 

AGAAATTTTA GATTGTGAAT ATTTTGTAAA AAACAGTAAG CAAAATTTTC CAGAATTCCC 2280 

AAAATGAACC AGATACCCCC TAGAAAATTA TACTATTGAG AAATCTATGG GGAGGATATG 2340 

AGAAAATAAA TTCCTTCTAA ACCACATTGG AACTGACCTG AAGAAGCAAA CTCGGAAAAT 2400 

ATAATAACAT CCCTGAATTC AGGCATTCAC AAGATGCAGA ACAAAATGGA TAAAAGGTAT 2460 

TTCACTGGAG AAGTTTTAAT TTCTAAGTAA AATTTAAATC CTAACACTTC ACTAATTTAT 2520 

AACTAAAATT TCTCATCTTC GTACTTGATG CTCACAGAGG AAGAAAATGA TGATGGTTTT 2580 

TATTCCTGGC ATCCAGAGTG ACAGTGAACT TAAGCAAATT ACCCTCCTAC CCAATTCTAT 2640 

GGAATATTTT ATACGTCTCC TTGTTTAAAA TCTGACTGCT TTACTTTGAT GTATCATATT 2700 
TTTAAATAAA AATAAATATT CCTTTAGAAG ATCACTCTAA AA 



AAB9 DNA sequence 

Gene name: Melanoma adhesion molecule, MUC 18 glycoprotein 
Unigene number: Hs. 2115 79 
Probeset Accession #: M28882 

Nucleic Acid Accession #: NM_006500 cluster 

Coding sequence: 27-1967 (predicted start/stop codons underlined) 

ACTTGCGTCT CGCCCTCCGG CCAAGC ATGG GGCTTCCCAG GCTGGTCTGC GCCTTCTTGC 60 

TCGCCGCCTG CTGCTGCTGT CCTCGCGTCG CGGGTGTGCC CGGAGAGGCT GAGCAGCCTG 120 

CGCCTGAGCT GGTGGAGGTG GAAGTGGGCA GCACAGCCCT TCTGAAGTGC GGCCTCTCCC 180 

AGTCCCAAGG CAACCTCAGC CATGTCGACT GGTTTTCTGT CCACAAGGAG AAGCGGACGC 240 

TCATCTTCCG TGTGCGCCAG GGCCAGGGCC AGAGCGAACC TGGGGAGTAC GAGCAGCGGC 300 

TCAGCCTCCA GGACAGAGGG GCTACTCTGG CCCTGACTCA AGTCACCCCC CAAGACGAGC 3 60 

GCATCTTCTT GTGCCAGGGC AAGCGCCCTC GGTCCCAGGA GTACCGCATC CAGCTCCGCG 420 

TCTACAAAGC TCCGGAGGAG CCAAACATCC AGGTCAACCC CCTGGGCATC CCTGTGAACA 480 

GTAAGGAGCC TGAGGAGGTC GCTACCTGTG TAGGGAGGAA CGGGTACCCC ATTCCTCAAG 540 

TCATCTGGTA CAAGAATGGC CGGCCTCTGA AGGAGGAGAA GAACCGGGTC CACATTCAGT 600 

CGTCCCAGAC TGTGGAGTCG AGTGGT TTGT ACACCTTGCA GAGTATTCTG AAGGCACAGC ' 660 

TGGTTAAAGA AGACAAAGAT GCCCAGTTTT ACTGTGAGCT CAACTACCGG CTGCCCAGTG 720 

GGAACCACAT GAAGGAGTCC AGGGAAGTCA CCGTCCCTGT TTTCTACCCG ACAGAAAAAG 780 

TGTGGCTGGA AGTGGAGCCC GTGGGAATGC TGAAGGAAGG GGACCGCGTG GAAATCAGGT 840 

GTTTGGCTGA TGGCAACCCT CCACCACACT TCAGCATCAG CAAGCAGAAC CCCAGCACCA 900 

GGGAGGCAGA GGAAGAGACA ACCAACGACA ACGGGGTCCT GGTGCTGGAG CCTGCCCGGA 960 

AGGAACACAG TGGGCGCTAT GAATGTCAGG CCTGGAACTT GGACACCATG ATATCGCTGC 1020 

TGAGTGAACC ACAGGAACTA CTGGTGAACT ATGTGTCTGA CGTCCGAGTG AGTCCCGCAG 1080 

CCCCTGAGAG ACAGGAAGGC AGCAGCCTCA CCCTGACCTG TGAGGCAGAG AGTAGCCAGG 1140 

ACCTCGAGTT CCAGTGGCTG AGAGAAGAGA CAGACCAGGT GCTGGAAAGG GGGCCTGTGC 1200 

TTCAGTTGCA TGACCTGAAA CGGGAGGCAG GAGGCGGCTA TCGCTGCGTG GCGTCTGTGC 1260 

CCAGCATACC CGGCCTGAAC CGCACACAGC TGGTCAAGCT GGCCATTTTT GGCCCCCCTT 1320 

GGATGGCATT CAAGGAGAGG AAGGTGTGGG TGAAAGAGAA TATGGTGTTG AATCTGTCTT 13 80 

GTGAAGCGTC AGGGCACCCC CGGCCCACCA TCTCCTGGAA CGTCAACGGC ACGGCAAGTG 1440 

AACAAGACCA AGATCCACAG CGAGTCCTGA GCACCCTGAA TGTCCTCGTG ACCCCGGAGC 1500 

TGTTGGAGAC AGGTGTTGAA TGCACGGCCT CCAACGACCT GGGCAAAAAC ACCAGCATCC 1560 

TCTTCCTGGA GCTGGTCAAT TTAACCACCC TCACACCAGA CTCCAACACA ACCACTGGCC 1620 

TCAGCACTTC CACTGCCAGT CCTCATACCA GAGCCAACAG CACCTCCACA GAGAGAAAGC 1680 

TGCCGGAGCC GGAGAGCCGG GGCGTGGTCA TCGTGGCTGT GATTGTGTGC ATCCTGGTCC 1740 

TGGCGGTGCT GGGCGCTGTC CTCTATTTCC TCTATAAGAA GGGCAAGCTG CCGTGCAGGC 1800 

GCTCAGGGAA GCAGGAGATC ACGCTGCCCC CGTCTCGTAA GACCGAACTT GTAGTTGAAG 1860 

TTAAGTCAGA TAAGCTCCCA GAAGAGATGG GCCTCCTGCA GGGCAGCAGC GGTGACAAGA 1920 

GGGCTCCGGG AGACCAGGGA GAGAAATACA TCGATCTGAG GCATTAGCCC CGAAT CACTT 1980 

CAGCTCCCTT CCCTGCCTGG ACCATTCCCA GCTCCCTGCT CACTCTTCTC TCAGCCAAAG 2040 

CCTCCAAAGG GACTAGAGAG AAGCCTCCTG CTCCCCTCAC CTGCACACCC C CTTTCAGAG 2100 

GGC CACTGGG TTAGGACCTG AGGACCTCAC TTGGCCCTGC AAGCCGCTTT TCAGGGAC C A 2160 

GTCCACCACC ATCTCCTCCA CGTTGAGTGA AGCTCATCCC AAGCAAGGAG CCCCAGTCTC 2220 

CCGAGCGGGT AGGAGAGTTT CTTGCAGAAC GTGTTTTTTC TTTACACACA TTATGGCTGT 228 0 

AAATACCTGG CTCCTGCCAG CAGCTGAGCT GGGTAGCCTC TCTGAGCTGG TTTCCTGCCC 2340 

CAAAGGCTGG CTTCCACCAT CCAGGTGCAC C A CTG AAGTG AGGACACACC GGAGCCAGGC 24 0 0 

GCCTGCTCAT GTTGAAGTGC GCTGTTCACA CCJGCTCCGG AGAGCACCCC AGCGGCATCC 24 60 

AGAAGCAGCT GCAGTGTTGC TGCCACCACC CTuCTGCTCG CCTCTTCAAA GTCTCCTGTG 2520 

ACATTTTTTC TTTGGTCAGA AGCCAGGAAC TGGTGTCATT CCTTAAAAGA TACGTGCCGG 2580 

GGCCAGGTGT GGTGGCTCAC GCCTGTAATC CCAGCACTTT GGGAGGCCGA GGCGGGCGGA 264 0 

TCACAAAGTC AGGACGAGAC CATCCTGGCT AACACGGTGA AACCCTGTCT CTACTAAAAA 2700 

TACAAAAAAA AATTAGCTAG GCGTAGTGGT TGGCACCTAT AGTCCCAGCT ACTCGGAAGG 2760 

CTGAAGCAGG AGAATGGTAT GAATCCAGGA GGTGGAGCTT GCAGTGAGCC GAGAC CGTGC 2820 

CACTGCACTC CAGCCTGGGC AACACAGCGA GACTCCGTCT CGAGGAAAAA AAAAGAAAAG 28 80 

ACGCGTACCT GCGGTGAGGA AGCTGGGCGC TGTTTTCGAG TTCAGGTGAA TTAGCCTCAA 2 94 0 
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TCCCCGTGTT CACTTGCTCC CATAGCCCTC TTGATGGATC ACGTAAAACT GAAAGGCAGC 3000 

GGGGAGCAGA CAAAGATGAG GTCTACACTG TCCTTCATGG GGATTAAAGC TATGGTTATA 3060 

TTAGCACCAA ACTTCTACAA ACCAAGCTCA GGGCCCCAAC CCTAGAAGGG CCCAAATGAG 3120 

AGAATGGTAC TTAGGGATGG AAAACGGGGC CTGGCTAGAG CTTCGGGTGT GTGTGTCTGT 3180 

CTGTGTGTAT GCATACATAT GTGTGTATAT ATGGTTTTGT CAGGTGTGTA AATTTGCAAA 3240 

TTGTTTCCTT TATATATGTA TGTATATATA TATATGAAAA TATATATATA TATGAAAAAT 3300 

AAAGCTTAAT TGTCCCAGAA AATCATACAT TGCTTTTTTA TTCTACATGG GTACCACAGG 3360 

AACCTGGGGG CCTGTGAAAC TACAACCAAA AGGCACACAA AACCGTTTCC AGTTGGCAGC 3420 

AGAGATCAGG GGTTACCTCT GCTTCTGAGC AAATGGCTCA AGCTCTACCA GAGCAGACAG 3480 

CTACCCTACT TTTCAGCAGC AAAACGTCCC GTATGACGCA GCACGAAGGG CCTGGCAGGC 3 54 0 
TGTTAGCAGG AGCTATGTCC CTTCCTATCG TTTCCGTCCA CTT 



AAC1 DNA sequence 

Gene name: Matrix metalloproteinase 1 (interstitial collagenase) 
Unigene number: Hs. 83169 
Probeset Accession #: X54925 

Nucleic Acid Accession #: NM_002421 cluster 

Coding sequence: 69-1478 {predicted start/stop codons underlined) 

ATATTGGAGT AGCAAGAGGC TGGGAAGCCA TCACTTACCT TGCACTGAGA AAGAAGACAA 60 

AGGCCAGT AT G CACAGCTTT CCTCCACTGC TGCTGCTGCT GTTCTGGGGT GTGGTGTCTC 120 

ACAGCTTCCC AGCGACTCTA GAAACACAAG AGCAAGATGT GGACTTAGTC CAGAAATACC 180 

TGGAAAAATA CTACAACCTG AAGAATGATG GGAGGCAAGT TGAAAAGCGG AGAAATAGTG 240 

GCCCAGTGGT TGAAAAATTG AAGCAAATGC AGGAATTCTT TGGGCTGAAA GTGACTGGGA 3 00 

AACCAGATGC TGAAACCCTG AAGGTGATGA AGCAGCCCAG ATGTGGAGTG CCTGATGTGG 3 60 

CTCAGTTTGT CCTCACTGAG GGGAACCCTC GCTGGGAGCA AACACATCTG ACCTACAGGA 420 

TTGAAAATTA CACGCCAGAT TTGCCAAGAG CAGATGTGGA CCATGCCATT GAGAAAGCCT 480 

TCCAACTCTG GAGTAATGTC ACACCTCTGA CATTCACCAA GGTCTCTGAG GGTCAAGCAG 540 

ACATCATGAT ATCTTTTGTC AGGGGAGATC ATCGGGACAA CTCTCCTTTT GATGGACCTG ' 600 

GAGGAAATCT TGCTCATGCT TTTCAACCAG GCCCAGGTAT TGGAGGGGAT GCTCATTTTG 660 

ATGAAGATGA AAGGTGGACC AACAATTTCA GAGAGTACAA CTTACATCGT GTTGCGGCTC 720 

ATGAACTCGG CCATTCTCTT GGACTCTCCC ATTCTAC TGA TATCGGGGCT TTGATGTACC 780 

CTAGCTACAC CTTCAGTGGT GATGTTCAGC TAGCT CAGGA TGACATTGAT GGCATCCAAG 84 0 

CCATATATGG ACGTTCCCAA AATCCTGTCC AGCCCATCGG CCCACAAACC CCAAAAGCAT 900 

GTGACAGTAA GCTAACCTTT GATGCTATAA CTACGATTCG GGGAGAAGTG ATGTTCTTTA 960 

AAGACAGATT CTACATGCGC ACAAATCCCT TCTACCCGGA AGTTGAGCTC AATTTCATTT 102 0 

CTGTTTTCTG GCCACAACTG CCAAATGGGC TTGAAGCTGC TTACGAATTT GCCGACAGAG 1080 

ATGAAGTCCG GTTTTTCAAA GGGAATAAGT ACTGGGCTGT TCAGGGACAG AATGTGCTAC 1140 

ACGGATACCC CAAGGACATC TACAGCTCCT TTGGCTTCCC TAGAACTGTG AAGCATATCG 1200 

ATGCTGCTCT TTCTGAGGAA AACACTGGAA AAACCTACTT CTTTGTTGCT AACAAATACT 1260 

GGAGGTATGA TGAATATAAA CGATCTATGG ATCCAGGTTA TCCCAAAATG ATAGCACATG 13 20 

ACTTTCCTGG AATTGGCCAC AAAGTTGATG CAGTTTTCAT GAAAGATGGA TTTTTCTATT 13 80 

TCTTTCATGG AACAAGACAA TACAAATTTG ATCCTAAAAC GAAGAGAATT TTGACTCTCC 1440 

AGAAAGCTAA TAGCTGGTTC AACTGCAGGA AAAATTGAAC ATTACTAATT TGAATGGAAA 1500 

ACACATGGTG TGAGTC CAAA GAAGGTGTTT TCCTGAAGAA CTGTCTATTT TCTCAGTCAT 1560 

TTTTAACCTC TAGAGTCACT GATACACAGA ATATAATCTT ATTTATACCT CAGTTTGCAT 1620 

ATTTTTTTAC TATTTAGAAT GTAGCCCTTT TTGTACTGAT ATAATTTAGT TCCACAAATG 1680 

GTGGGTACAA AAAGTCAAGT TTGTGGCTTA TGGATTCATA TAGGCCAGAG TTGCAAAGAT 1740 

CTTTTCCAGA GTATGCAACT CTGACGTTGA TCCCAGAGAG CAGCTTCAGT GACAAACATA 1800 

TCCTTTCAAG ACAGAAAGAG ACAGGAGACA TGAGTCTTTG CCGGAGGAAA AGCAGCTCAA 1860 

GAACACATGT GCAGTCACTG GTGTCACCCT GGATAGGCAA GGGATAACTC TTCTAACACA 192 0 
AAATAAGTGT TTTATGTTTG GAATAAAGTC AACCTTGTTT CTACTGTTTT 



AAC3 DNA sequence 

Gene name: Branched chain aminotransferase 1, cytosolic 

Unigene number: Hs. 157205 

Probeset Accession #: AA423987 

Nucleic Acid Accession #: NM_005504 cluster 

Coding sequence: 1-1155 (predicted start/stop codons underlined) 

ATGGATTGCA GTAACGGATC GGCAGAGTGT ACCGGAGAAG GAGGATCAAA AGAGGTGGTG 60 

GGGACTTTTA AGGCTAAAGA CCTAATAGTC ACACCAGCTA CCATTTTAAA GGAAAAAC C A 120 

GACCCCAATA ATCTGGTTTT TGGAACTGTG TTCACGGATC ATATGCTGAC GGTGGAGTGG 180 

TCCTCAGAGT TTGGATGGGA GAAACCTCAT ATCAAGCCTC TTCAGAACCT GTCATTGCAC 24 0 

C CTGG CT CAT CAGCTTTGCA CTATGCAGTG GAATTATTTG AAGGATTGAA GGCATTTCGA 3 00 

GGAGTAGATA ATAAAATTCG ACTGTTT C AG CCAAACCTCA ACATGGATAG AATGTATCGC 360 
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TCTGCTGTGA GGGCAACTCT GCCGGTATTT GACAAAGAAG AGCTCTTAGA GTGTATTCAA 420 

CAGCTTGTGA AATTGGATCA AGAATGGGTC CCATATTCAA CATCTGCTAG TCTGTATATT 4 80 

CGTCCTGCAT TCATTGGAAC TGAGCCTTCT CTTGGAGTCA AGAAGCCTAC CAAAGCCCTG 540 

CTCTTTGTAC TCTTGAGCCC AGTGGGACCT TATTTTTCAA GTGGAACCTT TAATCCAGTG 600 

5 TCCCTGTGGG CCAATCCCAA GTATGTAAGA GCCTGGAAAG GTGGAACTGG GGACTGCAAG 660 

ATGGGAGGGA ATTACGGCTC ATCTCTTTTT GCCCAATGTG AAGACGTAGA TAATGGGTGT 720 

CAGCAGGTCC TGTGGCTCTA TGGCAGAGAC CATCAGATCA CTGAAGTGGG AACTATGAAT 780 

CTTTTTCTTT ACTGGATAAA TGAAGATGGA GAAGAAGAAC TGGCAACTCC TCCACTAGAT 84 0 

GGCATCATTC TTCCAGGAGT GACAAGGCGG TGCATTCTGG ACCTGGCACA TCAGTGGGGT 900 

10 GAATTTAAGG TGT CAG AG AG ATACCTCACC ATGGATGACT TGACAACAGC CCTGGAGGGG 960 

AACAGAGTGA GAGAGATGTT TAGCTCTGGT ACAGCCTGTG TTGTTTGCCC AGTTTCTGAT 1020 

ATACTGTACA AAGGCGAGAC AATACACATT CCAACTATGG AGAATGGTCC TAAGCTGGCA 1080 

AGCCGCATCT TGAGCAAATT AACTGATATC CAGTATGGAA GAGAAGAGAG CGACTGGACA 1140 
ATTGTGCTAT CCTGA 
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ACG4 DNA sequence: 

Gene name: Pentaxin-related gene, rapidly induced by IL-1 beta 

Unigene number: Hs.2050 
r^20 Probeset Accession #: M31166 
= Nucleic Acid Accession #: NM_002852 cluster 

H Coding sequence: 68-1213 (predicted start/ stop codons underlined) 

CTCAAACTCA GCTCACTTGA GAGTCTCCTC CCGCCAGCTG TGGAAAGAAC TTTGCGTCTC 60 

i25 TCCAGCAATG CATCTCCTTG CGATTCTGTT TTGTGCTCTC TGGTCTGCAG TGTTGGCCGA 120 

GAACT CGGAT GATTATGATC TCATGTATGT GAATTTGGAC AACGAAATAG ACAATGGACT 180 

CCATCCCACT GAGGACCCCA CGCCGTGCGA CTGCGGTCAG GAGCACTCGG AATGGGACAA 24 0 

GCTCTTCATC ATGCTGGAGA ACTCGCAGAT GAGAGAGCGC ATGCTGCTGC AAGCCACGGA 300 

CGACGTCCTG CGGGGCGAGC TGCAGAGGCT GCGGGAGGAG CTGGGCCGGC TCGCGGAAAQ 360 

30 CCTGGCGAGG CCGTGCGCGC CGGGGGCTCC CGCAGAGGCC AGGCTGACCA GTGCTCTGGA '420 

CGAGCTGCTG CAGGCGACCC GCGACGCGGG CCGCAGGCTG GCGCGTATGG AGGGCGCGGA 4 80 

GGCGCAGCGC CCAGAGGAGG CGGGGCGCGC CCTGGCCGCG GTGCTAGAGG AGCTGCGGCA 540 

GACGCGAGCC GACCTGCACG CGGTGCAGGG CTGGGCTGCC CGGAGCTGGC TGCCGGCAGG 600 

TTGTGAAACA GCTATTTTAT TCCCAATGCG TTCCAAGAAG ATTTTTGGAA GCGTGCATCC 660 

35 AGTGAGACCA ATGAGGCTTG AGTCTTTTAG TGCCTGCATT TGGGTCAAAG CCACAGATGT 720 

ATTAAACAAA ACCATCCTGT TTTCCTATGG CACAAAGAGG AATCCATATG AAATCCAGCT 780 

GTATCTCAGC TACCAATCCA TAGTGTTTGT GGTGGGTGGA GAGGAGAACA AACTGGTTGC 840 

TGAAGCCATG GTTTCCCTGG GAAGGTGGAC CCACCTGTGC GGCACCTGGA ATTCAGAGGA 900 

AGGGCTCACA TCCTTGTGGG TAAATGGTGA ACTGGCGGCT AC CACTGT TG AGATGGCCAC 960 

4 0 AGGTCACATT GTT CCTGAGG GAGGAATCCT - GCAGATTGGC CAAGAAAAGA ATGGCTGCTG 1020 

TGTGGGTGGT GGCTTTGATG AAACATTAGC CTTCTCTGGG AGACTCACAG GCTT CAATAT 1080 

CTGGGATAGT GTTCTTAGCA ATGAAGAGAT AAGAGAGACC GGAGGAGCAG AGTCTTGTCA 1140 

CATCCGGGGG AATATTGTTG GGTGGGGAGT CACAGAGATC CAGCCACATG GAGGAGCTCA 1200 

GTATGTTTCA TAAA TGTTGT GAAACTCCAC TTGAAGC CAA AGAAAGAAAC TCACACTTAA 1260 

45 AACACATGCC AGTTGGGAAG GTCTGAAAAC TCAGTGCATA ATAGGAACAC TTGAGACTAA 132 0 

TGAAAGAGAG AGTTGAGACC AATCTTTATT TGTACTGGCC AAATACTGAA TAAACAGTTG 1380 

AAGGAAAGAC ATTGGAAAAA GCTTTTGAGG ATAATGTTAC TAGACTTTAT GCCATGGTGC 144 0 

TTTCAGTTTA ATGCTGTGTC TGTGTCAGAT AAACTCTCAA ATAATTAAAA AGGACTGTAT 1500 

TGTTGAACAG AGGGACAATT GTTTTACTTT TCTTTGGTTA ATTTTGTTTT GGCCAGAGAT 1560 

50 GAATTTTACA TTGGAAGAAT AACAAAATAA GATTTGTTGT CCATTGTTCA TTGTTATTGG 1620 

TATGTACCTT ATTACAAAAA AAATGATGAA AACATATTTA TACTACAAGG TGACTTAACA 1680 

ACTATAAATG TAGTTTATGT GTTATAATCG AATGTCACGT TTTTGAGAAG ATAGTCATAT 174 0 

AAGTTATATT GCAAAAGGGA TTTGTATTAA T TTAAGACT A TTTTTGTAAA GCTCTACTGT 1800 
AAATAAAATA TTTTATAAAA CTAAAAAAAA AAAAAAA 

55 



ACK5 DNA sequence 

Gene name: Von Willebrand factor; Coagulation factor VIII 
Unigene number: Hs. 1108 02 
60 1 Probeset Accession #: M10321 

Nucleic Acid Accession #: NM_000552 

Coding sequence: 311-8752 (predicted start/stop codons underlined) 

AG CT C AC AG C TATTGTGGTG GGAAAGGGAG GGTGGTTGGT GGATGTCACA GCTTGGGCTT 60 

65 TATCTCCCCC AGCAGTGGGG ACTCCACAGC CCCTGGGCTA CATAACAGCA AGACAGTCCG 12 0 

GAGCTGTAGC AGACCTGATT GAGCCTTTGC AGCAG CTGAG AGCATGGCCT AGGGTGGGCG 18 0 

GCACCATTGT CCAGCAGCTG AGTTTCCCAG GGACCTTGGA GATAGCCGCA GCCCTCATTT 24 0 

GCAGGGGAAG GCACCATTGT CCAGCAGCTG AGTTTCCCAG GGACCTTGGA GATAGCCGCA 300 
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GCCCTCATTT ATGATTCCTG CCAGATTTGC CGGGGTGCTG CTTGCTCTGG CCCTCATTTT 360 

GCCAGGGACC CTTTGTGCAG AAGGAACTCG CGGCAGGTCA TCCACGGCCC GATGCAGCCT 420 

TTTCGGAAGT GACTTCGTCA ACACCTTTGA TGGGAGCATG TACAGCTTTG CGGGAT ACT G 480 

CAGTTACCTC CTGGCAGGGG GCTGCCAGAA ACGCTCCTTC TCGATTATTG GGGACTTCCA 540 

GAATGGCAAG AGAGTGAGCC TCTCCGTGTA TCTTGGGGAA TTTTTTGACA TCCATTTGTT 600 

TGTCAATGGT ACCGTGACAC AGGGGGACCA AAGAGTCTCC ATGCCCTATG CCTCCAAAGG 660 

GCTGTATCTA GAAACTGAGG CTGGGTACTA CAAGCTGTCC GGTGAGGCCT ATGGCTTTGT 720 

GGCCAGGATC GATGGCAGCG GCAACTTTCA AGTCCTGCTG TCAGACAGAT ACTTCAACAA 780 

GACCTGCGGG CTGTGTGGCA ACTTTAACAT CTTTGCTGAA GATGACTTTA TGACCCAAGA 840 

AGGGACCTTG ACCTCGGACC CTTATGACTT TGCCAACTCA TGGGCTCTGA GCAGTGGAGA 900 

ACAGTGGTGT GAACGGGCAT CTCCTCCCAG CAGCTCATGC AACATCTCCT CTGGGGAAAT 960 

GCAGAAGGGC CTGTGGGAGC AGTGCCAGCT TCTGAAGAGC ACCTCGGTGT TTGCCCGCTG 1020 

CCACCCTCTG GTGGACCCCG AGCCTTTTGT GGCCCTGTGT GAGAAGACTT TGTGTGAGTG 1080 

TGCTGGGGGG CTGGAGTGCG CCTGCCCTGC CCTCCTGGAG TACGCCCGGA CCTGTGCCCA 1140 

GGAGGGAATG GTGCTGTACG GCTGGACCGA CCACAGCGCG TGCAGCCCAG TGTGCCCTGC 1200 

TGGTATGGAG TATAGGCAGT GTGTGTCCCC TTGCGCCAGG ACCTGCCAGA GCCTGCACAT 1260 

CAATGAAATG TGTCAGGAGC GATGCGTGGA TGGCTGCAGC TGCCCTGAGG GACAGCTCCT 1320 

GGATGAAGGC CTCTGCGTGG AGAGCACCGA GTGTC CCTGC GTGCATTCCG GAAAGCGCTA 1380 

CCCTCCCGGC ACCTCCCTCT CTCGAGACTG CAACACCTGC ATTTGCCGAA ACAGCCAGTG 1440 

GATCTGCAGC AATGAAGAAT GTCCAGGGGA GTGCCTTGTC ACTGGTCAAT CCCACTTCAA 1500 

GAGCTTTGAC AACAGATACT TCACCTTCAG TGGGATCTGC CAGTACCTGC TGGCCCGGGA 1560 

TTGCCAGGAC CACTCCTTCT CCATTGTCAT TGAGACTGTC CAGTGTGCTG ATGACCGCGA 1620 

CGCTGTGTGC ACCCGCTCCG TCACCGTCCG GCTGCCTGGC CTG CACAAC A GCCTTGTGAA 1680 

ACTGAAGCAT GGGGCAGGAG TTG£CATGGA TGGCCAGGAC ATCCAGCTCC CCCTCCTGAA 1740 

AGGTGACCTC CGCATCCAGC ATACAGTGAC GGCCTCCGTG CGCCTCAGCT ACGGGGAGGA 1800 

CCTGCAGATG GACTGGGATG GCCGCGGGAG GCTGCTGGTG AAGCTGTCCC CCGTCTACGC 1860 

CGGGAAGACC TGCGGCCTGT GTGGGAATTA CAATGGCAAC CAGGGCGACG ACTTCCTTAC 1920 

CCCCTCTGGG CTGGCAGAGC CCCGGGTGGA GGACTTCGGG AACGCCTGGA AGCTGCACGG 1980 

GGACTGCCAG GACCTGCAGA AGCAGCACAG CGATCCCTGC GCCCTCAACC CGCGCATGAC 2040 

CAGGTTCTCC GAGGAGGCGT GCGCGGTCCT GACGTCCCCC ACATTCGAGG CCTGCCATCG 2100 

TGCCGTCAGC CCGCTGCCCT ACCTGCGGAA CTGCCGCTAC GACGTGTGCT CCTGCTCGGA 2160 

CGGCCGCGAG TGCCTGTGCG GCGCCCTGGC CAGCTATGCC GCGGCCTGCG CGGGGAGAGG 2220 

CGTGCGCGTC GCGTGGCGCG AGC CAGGCCG CTGTGAG CTG AACTGCCCGA AAGGCCAGGT 2280 

GTACCTGCAG TGCGGGACCC CCTGCAACCT GACCTGCCGC TCTCTCTCTT ACCCGGATGA 2340 

GGAATGCAAT GAGGCCTGCC TGGAGGGCTG CTTCTGCCCC CCAGGGCTCT ACATGGATGA 2400 

GAGGGGGGAC TGCGTGCCCA AGGCCCAGTG CCCCTGTTAC TATGACGGTG AGATCTTCCA 24 60 

GCCAGAAGAC ATCTTCTCAG ACCATCACAC CATGTGCTAC TGTGAGGATG GCTTCATGCA 2520 

CTGTACCATG AGTGGAGTCC CCGGAAGCTT GCTGCCTGAC GCTGTCCTCA GCAGTCCCCT 2580 

GTCTCATCGC AGCAAAAGGA GCCTATCCTG TCGGCCCCCC ATGGTCAAGC TGGTGTGTCC 2640 

CGCTGACAAC CTGCGGGCTG AAGGGCTCGA GTGTACCAAA ACGTGCCAGA ACTATGACCT 2700 

GGAGTGCATG AGCATGGGCT GTGTCTCTGG CTGCCTCTGC CCCCCGGGCA TGGTCCGGCA 2760 

TGAGAACAGA TGTGTGGCCC TGGAAAGGTG TCCCTGCTTC CAT C AGGGC A AGGAGTATGC 2820 

CCCTGGAGAA ACAGTGAAGA TTGGCTGCAA CACTTGTGTC TGTCGGGACC GGAAGTGGAA 2 880 

CTGCACAGAC CATGTGTGTG ATGCCACGTG CTCCACGATC GGCATGGCCC ACTACCTCAC 2940 

CTTCGACGGG CTCAAATACC TGTTCCCCGG GGAGTGCCAC TACGTTCTGG TGCAGGATTA 3000 

CTGCGGCAGT AACCCTGGGA CCTTTCGGAT CCTAGTGGGG AATAAGGGAT GCAGCCACCC 3060 

CTCAGTGAAA TGCAAGAAAC GGGTCACCAT CCTGGTGGAG GGAGGAGAGA TTGAGCTGTT 3120 

TGACGGGGAG GTGAATGTGA AGAGGCCCAT GAAGGATGAG ACTCACTTTG AGGTGGTGGA 3180 

GTCTGGCCGG TACATCATTC TGCTGCTGGG CAAAGCCCTC TCCGTGGTCT GGGACCGCCA 3240 

CCTGAGCATC TCCGTGGTCC TGAAGCAGAC ATACCAGGAG AAAGTGTGTG GCCTGTGTGG 3300 

GAATTTTGAT GGCATCCAGA ACAATGACCT CACCAGCAGC AACCTCCAAG TGGAGGAAGA 33 60 

CCCTGTGGAC TTTGGGAACT CCTGGAAAGT GAGCTCGCAG TGTGCTGACA CCAGAAAAGT 3420 

GCCTCTGGAC TCATCCCCTG CCACCTGCCA TAACAACATC ATGAAGCAGA CGATGGTGGA 3480 

TTCCTCCTGT AGAATCCTTA CCAGTGACGT CTTCCAGGAC TGCAACAAGC TGGTGGACCC 354 0 

CGAGC CATAT CTGGATGTCT GCATTTACGA CACCTGCTCC TGTGAGTCCA TTGGGGACTG 3600 

CGCCTGCTTC TGCGACACCA TTGCTGCCTA TGCCCACGTG TGTGCCCAGC ATGGCAAGGT 3 660 

GGTGACCTGG AGGACGGC C A CATTGTGCCC CCAGAGCTGC GAGGAGAGGA ATCTCCGGGA 3720 

GAACGGGTAT GAGTGTGAGT GGCGCTATAA CAGCTGTGCA CCTGCCTGTC AAGTCACGTG 3780 

TCAGCACCCT GAGCCACTGG CCTGCCCTGT GCAGTGTGTG GAGGGCTGCC ATGCCCACTG 3 84 0 

CCCTCCAGGG AAAATCCTGG -^TGAGCTTTT GCAGACCTGC GTTGACCCTG AAGACTGTCC 3 900 

AGTGTGTGAG GTGGCTGGCC GGCGTTTTGC CTCAGGAAAG AAAGTCACCT TGAATCCCAG 3 960 

TGACCCTGAG CACTGCCAGA TTTGCCACTG TGATGTTGTC AACCTCACCT GTGAAGCCTG 4020 

CCAGGAGCCG GGAGGCCTGG TGGTGCCTCC CACAGATGCC CCGGTGAGCC CCACCACTCT 408 0 

GTATGTGGAG GACATCTCGG AACCGCCGTT GCACGATTTC TACTGCAGCA GGCTACTGGA 414 0 

CCTGGTCTTC CTGCTGGATG GCTCCTCCAG GCTGTCCGAG GCTGAGTTTG AAGTGCTGAA 4200 

GGC CTTTGTG GTGGACATGA TGGAGCGGCT GCGCATCTCC CAGAAGTGGG TCCGCGTGGC 426 0 

C GTGGTGG AG TACCACGACG GCTCCCACGC CTACATCGGG CTCAAGGACC GGAAGCGACC 4320 
GTCAGAGCTG CGGCGCATTG CCAGCCAGGT GAAGTATGCG GGCAGCCAGG TGGCCTCCAC 4380 
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CAGCGAGGTC TTGAAATACA CACTGTTCCA AATCTTCAGC AAGATCGACC GCCCTGAAGC 4440 
CTCCCGCATC GCCCTGCTCC TGATGGCCAG CCAGGAGCCC CAACGGATGT CCCGGAACTT 4500 
TGTCCGCTAC GTCCAGGGCC TGAAGAAGAA GAAGGTCATT GTGATCCCGG TGGGCATTGG 4560 
GCCCCATGCC AACCTCAAGC AGATCCGCCT CATCGAGAAG CAGGCCCCTG AGAACAAGGC 4620 
CTTCGTGCTG AGCAGTGTGG ATGAGCTGGA GCAGCAAAGG GACGAGATCG TTAGCTACCT 4680 
CTGTGACCTT GCCCCTGAAG CCCCTCCTCC TACTCTGCCC CCCCACATGG CACAAGTCAC 4740 
TGTGGGCCCG GGGCTCTTGG GGGTTTCGAC CCTGGGGCCC AAGAGGAACT CCATGGTTCT 4800 
GGATGTGGCG TTCGTCCTGG AAGGATCGGA CAAAATTGGT GAAGCCGACT TCAACAGGAG 4860 
CAAGGAGTTC ATGGAGGAGG TGATTCAGCG GATGGATGTG GGCCAGGACA GCATCCACGT 4920 
CACGGTGCTG CAGTACTCCT ACATGGTGAC CGTGGAGTAC CCCTTCAGCG AGGCACAGTC 4 980 
CAAAGGGGAC ATCCTGCAGC GGGTGCGAGA GATCCGCTAC CAGGGCGGCA ACAGGACCAA 5040 
CACTGGGCTG GCCCTGCGGT ACCTCTCTGA CCACAGCTTC TTGGTCAGCC AGGGTGACCG 5100 
GGAGCAGGCG CCCAACCTGG TCTACATGGT CACCGGAAAT CCTGCCTCTG ATGAGATCAA 5160 
GAGGCTGCCT GGAGACATCC AGGTGGTGCC CATTGGAGTG GGCCCTAATG CCAACGTGCA 5220 
GGAGCTGGAG AGGATTGGCT GGCCCAATGC CCCTATCCTC ATCCAGGACT TTGAGACGCT 5280 
CCCCCGAGAG GCTCCTGACC TGGTGCTGCA GAGGTGCTGC TCCGGAGAGG GGCTGCAGAT 5340 
CCCCACCCTC TCCCCTGCAC CTGACTGCAG CCAGCCCCTG GACGTGATCC TTCTCCTGGA 5400 
TGGCTCCTCC AGTTTCCCAG CTTCTTATTT TGATGAAATG AAGAGTTTCG CCAAGGCTTT 5460 
CATTTCAAAA GCCAATATAG GGCCTCGTCT CACTCAGGTG TCAGTGCTGC AGTATGGAAG 5520 
CATCACCACC ATTGACGTGC CATGGAACGT GGTCCCGGAG AAAGCC CATT TGCTGAGCCT 5580 
TGTGGACGTC ATGCAGCGGG AGGGAGGCCC CAGCCAAATC GGGGATGCCT TGGGCTTTGC 5640 
TGTGCGATAC TTGACTTCAG AAATGCATGG TGCCAGGCCG GGAGCCTCAA AGGCGGTGGT 5700 
CATCCTGGTC ACGGACGTCT CTGTGGATTC AGTGGATGCA GCAGCTGATG CCGCCAGGTC 5760 
CAACAGAGTG ACAGTGTTCC CTAT TGGAAT TGGAGATCGC TACGATGCAG CCCAGCTACG 5820 
GATCTTGGCA GGCCCAGCAG GCGACTCCAA CGTGGTGAAG CTCCAGCGAA TCGAAGACCT 5880 
CCCTACCATG GTCACCTTGG GCAATTCCTT CCTCCACAAA CTGTGCTCTG GATTTGTTAG 5940 
GATTTGCATG GATGAGGATG GGAATGAGAA GAGGCCCGGG GACGTCTGGA CCTTGCCAGA 6000 
CCAGTGCCAC ACCGTGACTT GCCAGCCAGA TGGCCAGACC TTGCTGAAGA GTCATCGGGT 6060 
CAACTGTGAC CGGGGGCTGA GGCCTTCGTG CCCTAACAGC CAGTCCCCTG TTAAAGTGGA -6120 
AGAGACCTGT GGCTGCCGCT GGACCTGCCC CTGCGTGTGC ACAGGCAGCT CCACTCGGCA '6180 
CATCGTGACC TTTGATGGGC AGAATTT CAA GCTGACTGGC AGCTGTTCTT ATGTC CTATT 6240 
TCAAAACAAG GAGCAGGACC TGGAGGTGAT TCTCCATAAT GGTGCCTGCA GCCCTGGAGC 63 00 
AAGGCAGGGC TGCATGAAAT CCATCGAGGT GAAGCACAGT GCCCTCTCCG TCGAGCTGCA 6360 
CAGTGACATG GAGGTGACGG TGAATGGGAG ACTGGTCTCT GTTCCTTACG TGGGTGGGAA 6420 
CATGGAAGTC AACGTTTATG GTGCCATCAT GCATGAGGTC AGATTCAATC ACCTTGGTCA 64 80 
CATCTTCACA TTCACTCCAC AAAACAATGA GTTCCAACTG CAGCTCAGCC CCAAGACTTT 6540 
TGCTTCAAAG ACGTATGGTC TGTGTGGGAT CTGTGATGAG AACGGAGCCA ATGACTTCAT 6600 
GCTGAGGGAT GGCACAGTCA CCACAGACTG GAAAACACTT GTT CAGGAAT GGACTGTGCA 6660 
GCGGCCAGGG CAGACGTGCC AGCCCATCCT GGAGGAGCAG TGTCTTGTCC CCGACAGCTC 6720 
CCACTGCCAG GTCCTCCTCT TACCACTGTT TGCTGAATGC CACAAGGTCC TGGCTCCAGC 6780 
CACATTCTAT GCCATCTGCC AGCAGGACAG TTGCCACCAG GAGCAAGTGT GTGAGGTGAT 684 0 
CGCCTCTTAT GCCCACCTCT GTCGGACCAA CGGGGTCTGC GTTGACTGGA GGACACCTGA 6900 
TTTCTGTGCT ATGTCATGCC CACCATCTCT GGTCTACAAC CACTGTGAGC ATGGCTGTCC 6960 
CCGGCACTGT GATGGCAACG TGAGCTCCTG TGGGGACCAT CCCTCCGAAG GCTGTTTCTG 7020 
CCCTCCAGAT AAAGTCATGT TGGAAGGCAG CTGTGTCCCT GAAGAGGCCT GCACTCAGTG 7080 
CATTGGTGAG GATGGAGTCC AGCACCAGTT CCTGGAAGCC TGGGTCCCGG ACCACCAGCC 7140 
CTGTCAGATC TGCACATGCC TCAGCGGGCG GAAGGTCAAC TGCACAACGC AGCCCTGCCC 7200 
CACGGCCAAA GCTCCCACGT GTpGCCTGTG TGAAGTAGCC CGCCTCCGCC AGAATGCAGA 7260 
CCAGTGCTGC CCCGAGTATG AGTGTGTGTG TGACCCAGTG AGCTGTGACC TGCCCCCAGT 7320 
GCCTCACTGT GAACGTGGCC TCCAGCCCAC ACTGACCAAC CCTGGCGAGT GCAGACC CAA 73 80 
CTTCACCTGC GCCTGCAGGA AGGAGGAGTG CAAAAGAGTG TCCCCACCCT CCTGCCCCCC 7440 
GCACCGTTTG CCCACCCTTC GGAAGACCCA GTGCTGTGAT GAGTATGAGT GTGCCTGCAA 7500 
CTGTGTCAAC TCCACAGTGA GCTGTCCCCT TGGGTACTTG GCCTCAACCG CCACCAATGA 7560 
CTGTGGCTGT ACCACAACCA CCTGCCTTCC CGACAAGGTG TGTGTCCACC GAAGCACCAT 7620 
CTACCCTGTG GGCCAGTTCT GGGAGGAGGG CTGCGATGTG TGCACCTGCA CCGACATGGA 7680 
GGATGCCGTG ATGGGCCTCC GCGTGGCCCA GTGCTCCCAG AAGCCCTGTG AGGACAGCTG 7740 
TCGGTCGGGC TTCACTTACG TTCTGCATGA AGGCGAGTGC TGTGGAAGGT GCCTGCCATC 78 00 
TGCCTGTGAG GTGGTGACTG GCTCACCGCG GGGGGACTCC CAGTCTTCCT GGAAGAGTGT 7860 
CGGCTCCCAG TGGGCCTCCC CGGAGAACCC CTGCCTCATC AATGAGTGTG TCCGAGTGAA 7920 
GGAGGAGGTC TTTATACAAC AAAGGAACGT CTCCTGCCCC *~ 'AGCTGGAGG TCCCTGTCTG 7980 
CCCCTCGGGC TTTCAGCTGA GCTGTAAGAC CTCAGCGTGC TGCCCAAGCT GTCGCTGTGA 8040 
GCGCATGGAG GCCTGCATGC TCAATGGCAC TGTCATTGGG CCCGGGAAGA CTGTGATGAT 8100 
CGATGTGTGC ACGACCTGCC GCTGCATGGT GCAGGTGGGG GTCATCTCTG GATTCAAGCT 8160 
GGAGTGCAGG AAGACCACCT GCAACCCCTG CCCCCTGGGT TACAAGGAAG AAAATAACAC 8220 
AGGTGAATGT TGTGGGAGAT GTTTGCCTAC GGCTTGCACC ATT CAGCTAA GAGGAGGACA 8280 
GATCATGACA CTGAAGCGTG ATGAGACGCT CCAGGATGGC TGTG AT ACT C ACTTCTGCAA 8340 
GGTCAATGAG AGAGGAGAGT ACTTCTGGGA GAAGAGGGTC ACAGGCTGCC CACCCTTTGA 8400 
TGAACACAAG TGTCTGGCTG AGGGAGGTAA AATTATGAAA ATTCCAGGCA CCTGCTGTGA 8460 



CACATGTGAG GAGCCTGAGT GCAACGACAT CACTGCCAGG CTGCAGTATG TCAAGGTGGG 8520 

AAGCTGTAAG TCTGAAGTAG AGGTGGATAT CCACTACTGC CAGGGCAAAT GTGCCAGCAA 858 0 

AGCCATGTAC TCCATTGACA TCAACGATGT GCAGGACCAG TGCTCCTGCT GCTCTCCGAC 864 0 

ACGGACGGAG CCCATGCAGG TGGCCCTGCA CTGCACCAAT GGCTCTGTTG TGTACCATGA 8700 

GGTTCTCAAT GCCATGGAGT GCAAATGCTC CCCCAGGAAG TGCAGCAAGT_GAGGCTGCTG 8760 

CAGCTGCATG GGTGCCTGCT GCTGCCTGCC TTGGCCTGAT GGCCAGGCCA GAGTGCTGCC 8820 

AGTCCTCTGC ATGTTCTGCT CTTGTGCCCT TCTGAGCCCA CAATAAAGGC TGAGCTCTTA 8880 
TCTTG CTGCA TGTTCTGCTC TTGTGCCCTT CTGAGCCCAC AAT 
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AAC7 DNA sequence 
Gene name: KIAA1294 protein 
Probeset Accession #: AA432248 
Nucleic Acid Accession #: AB037715 
15 Coding sequence: 370-3489 (predicted start/stop codons underlined) 

GAACGCTCAC AGAACAGGCA GTGCAATTCC ATGTTCCTCT TAAGTATGTT AGCCCTACCG 60 

GGAGCTGAGC TGGCCAGTCT ACTTGGAGAG GAAAAGTAGA TCTGGGGAAG GTGGAAGGGT 120 

L.s. CAGTTCCTAA GTGACTTCCT CCT CGGGGAT GGTAAGGGCA TTTGCTGATC TCCAGTGACT 180 

Go GCCTGGTGCC TCATGGTCAG ACTCGGCTGT CTCACTCCCA GATATCTGAT TTTGCAAAAA 240 

GGGACACACC TATCTGCAGC AAAGAAGACA CTGACCAGAT TGCGAGCGGT GCTTTTGGAT 300 

P GCTCTGTAGC CACCCGGGGC CCAGGAGGAC TGACTCGGCA GCAGGATTCG TGCATGGGAA 360 

111 TCGGAGACCA_TGGCAGTGCA GCTGGTGCCC GACTCAGCTC TCGGCCTGCT GATGATGACG 420 

iU GAGGGCCGCC GATGTCAAGT ACATCTTCTT GATGACAGGA AGCTGGAACT CCTAGTACAG 480 

iff? 5 CCCAAGCTGT TGGCCAAGGA GCTTCTTGAC CTTGTGGCTT CTCACTTCAA TCTGAAGGAA 540 

%1 AAGGAGTACT TTGGAATAGC ATTCACAGAT GAAACGGGAC ACTTAAACTG GCTTCAGCTA 600 

GATCGAAGAG TATTGGAACA TGACTTCCCT AAAAAGTCAG GACCCGTGGT TTTATACTTT 660 

O TGTGTCAGGT TCTATATAGA AAGCATTTCA TACCTGAAGG ATAATGCTAC CATTGAGCTT 720 

e TTCTTTCTGA ACGCGAAGTC CTGCATCTAC AAGGAGCTTA TTGACGTTGA CAGCGAAGTG ^ 780 

U30 GTGTTTGAAT TAGCTTCCTA TATTTTACAG GAGGCAAAGG GAGATTTTTC TAGCAATGAA * 84 0 

hi GTTGTGAGGA GTGACTTGAA GAAGCTGCCA GCCCTTCCCA CCCAAGCCCT GAAGGAGCAC 900 

1~ CCTTCCCTGG CCTACTGTGA AGAC AG AGT C ATTGAGCACT ACAAGAAACT GAACGGTCAG 960 

ACAAGAGGTC AAGCAATCGT AAACTACATG AGCATCGTGG AGTCTCTCCC AACCTACGGG 1020 

CP GTTCACTATT ATGCAGTGAA GGACAAGCAG GGCATACCAT GGTGGCTGGG CCTGAGC TAC 1080 

Q35 AAAGGGATCT TCCAGTATGA CTACCATGAT AAAGTGAAGC CAAGAAAGAT ATTCCAATGG 1140 

TZ AGACAGTTGG AAAACCTGTA CTTCAGAGAA AAGAAGTTTT CCGTGGAAGT TCATGACCCA 1200 

CGCAGGGCTT CAGTGACAAG GAGGACGTTT GGGCACAGCG GCATTGCAGT GCACACGTGG 1260 

TATGCATGTC CGGCATTGAT CAAGTCCATC TGGGCTATGG C CAT AAGC C A ACACCAGTTC 1320 

TATCTGGACA GAAAGCAGAG TAAGTCCAAA ATCCATGCAG CACGCAGCCT GAGTGAGATC 1380 

4 0 GCCATCGACC TGAQCGAGAC GGGGACGCTG AAGACCTCGA AGCTGGCCAA CATGGGTAGC 1440 

AAGGGGAAGA TCATCAGCGG CAGCAGCGGC AGCCTGCTGT CTTCAGGTTC TCAGGAATCA 1500 

GATAGCTCGC AGTCGGCCAA GAAGGACATG CTGGCTGCCT TGAAGTCCAG GCAGGAAGCT 1560 

CTGGAGGAAA CCCTGCGTCA GAGGCTGGAG GAACTGAAGA AGCTGTGTCT CCGAGAAGCT 1620 

GAGCTCACGG GCAAGCTGCC AGTAGAATAT CCCCTGGATC CAGGGGAGGA ACCACCCATT 1680 

4 5 GTTCGGAGAA GAATAGGAAC AGCCTTCAAA CTGGATGAAC AGAAAATCCT GCCCAAAGGA 1740 

GAGGAAGCTG AGCTGGAACG CCTGGAACGA GAGTTTGCCA TTCAGTCCCA GATTACGGAG 1800 

GCCGCCCGCC GC CTAGCCAG TGACCCCAAC GTCAGCAAAA AACTGAAGAA ACAAAGGAAA 1860 

ACCTCGTATC TGAATGCACT GAAGAAACTG CAGGAGATTG AAAATGCAAT CAATGAGAAC 1920 

CGCATCAAGT CTGGGAAGAA ACCCACCCAG AGGGCTTCGC TGATCATAGA CGATGGAAAC 1980 

50 ATTGCCAGTG AAGACAGCTC CCTCTCAGAT GCCCTTGTTC TTGAGGATGA AGACTCTCAG 2040 

GTTACCAGCA CAATATCCCC CCTACATTCT CCTCACAAGG GACTCCCTCC TCGGCCACCG 2100 

TCGCACAACA GGCCTCCTCC TCCCCAGTCC CTGGAGGGAC TCCGACAGAT GCACTATCAC 2160 

CGCAACGACT ATG ACAAGT C ACCCAT CAAG CCCAAAATGT GGAGTGAGTC CTCTTTAGAT 2220 

GAACCCTATG AGAAGGTCAA GAAGCGCTCC TCTCACAGCC ATTCCAGCAG CCACAAGCGC 2280 

55 TTCCCCAGCA CAGGAAGCTG TGCGGAAGCC GGCGGAGGAA GCAACTCCTT GCAGAACAGC 2340 

CCCATCCGCG GCCTCCCGCA CTGGAACTCC CAGTCCAGCA TGCCGTCCAC GCCAGACCTG 2400 

CGGGTCCGGA GTCCCCACTA CGTCCATTCC ACGAGGTCGG TGGACATCAG CCCCACCCGA 2460 

CTGCACAGCC TCGCACTGCA CTTTAGGCAC CGGAGCTCCA GCCTGGAGTC CCAGGGCAAG 2520 

CTCCTGGGCT CGGAAAACGA CACCGGGAGC CCCGACTTCT ACACCCCGCG GACTCGTAGC 2580 

60 AGCAACGGCT CAGACCCCAT GGACGACTGC TCGTCGTGCA CCAGCCACTC GAGCTCGGAG 264 0 

CACTACTACC CGGCGCAGAT GAACGCCAAC TACTCCACGC TGGCCGAGGA CTCGCCGTCC 2700 

AAGGCGCGCC AGAGGCAGAG GC AGCGG C AG CGGGCGGCGG GCGCACTGGG CTCAGCCAGC 2760 

TCGGGCAGCA TGCCCAACCT GGCGGCGCGC GGGGGTGCGG GGGGCGCGGG GGGCGCGGGG 2820 

GGCGGTGTGT ACCTGCACAG CCAGAGCCAG CCCAGCTCGC AGT AC CGC AT CAAGGAGTAC 28 80 

65 CCGCTGTACA TCGAGGGCGG CGCCACGCCC GTGGTGGTGC GCAGCCTGGA GAGCGACCAG 2940 

GAGTGCCACT ACAG CGTCAA GGCTCAGTTC AAGACGTCCA ACTCCTACAC GGCGGGCGGC 3000 

CTGTTCAAGG AGAGCTGGCG CGGCGGCGGC GGCGACGAGG GCGACACGGG CCGCCTGACG 3 06 0 
CCGTCGCGAT CGCAGATCCT GCGGACTCCG TCGCTGGGCC GCGAGGGCGC CCACGACAAG 3120 
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GGCGCGGGCC GTGCCGCCGT CTCAGACGAG CTGCGCCAGT GGTACCAGCG TTCCACCGCC 3180 
TCGCACAAGG AGCACAGCCG CCTGTCGCAC ACCAGCTCCA CCTCCTCGGA CAGCGGCTCG 324 0 
CAGTACAGCA CCTCCTCCCA GAGCACCTTC GTGGCGCACA GCAGGGTCAC CAGGATGCCC 3300 
CAGATGTGCA AGGCCACGTC AGCTGCCTTA CCTCAAAGCC AGAGAAGCTC GACACCGTCA 3360 
AGTGAAATTG GAGCCACCCC CCCAAGCAGC CCCCACCACA TCCTAACCTG GCAGACTGGA 3420 
GAAGCAACAG AAAACTCACC CATTCTGGAT GGGTCTGAGT CTCCACCTCA CCAAAGTACT 3480 
GATGAATAGA GGAGCTACAA TGATAGCTGT TTCCTGGATT CCTCCCTCTA TCCAGAACTA 3540 
GCTGATGTCC AGTGGTACGG GCAGGAAAAA GCCAAGCCCG GGACCCTCGT GTGAGCCAGC 3 600 
CCGGCCTAAT CTGACCGCCT CAACGCCATT CTGAGATCAC CTCACTGCCT CTCATTTGCC 3660 
TTACCCAGAC GCACCGTCAC CCTGCACCAG CTTTGGCCCT CAGCACTTTT TTTCTCCTGT 3 720 
CTCCGCATTC CCTCCCCCTT GAAAAC CTG A CTGAGGAGAC ATTCTGGAAG GTTCCGGTCC 3 780 
CACTGTGTGT CCCCTGGCGC TCTTGCCCAT AGAGAGCCAG ACACCAATCC TCAATGGCAC 3840 
CTTGGTGGCT TCCCTCTGCC ATGACAGCCC CTAGGCCAGG AACCATCAGG GGGGCCAGCC 3900 
GGCATC CAAT TCCTGCGGAT AAGTAGCGTT GGGAGAGAAC GGGAAAGGGG ACTTGGGTTA 3960 
CAGGGTGACC CAGAAAGACG ATTCAGCTGT GTCCAGCCTG CCACCCATAC GTAGGCCAAC 4020 
CAAGCACTTC ATGAAGAGGA GGCCTCGTGG CATATTCAGT TTACACCTGA AATATTCCTT 4080 
GATGGGACAG CTTGTGGGGA TGGCTATGGG GGAAGGGGAG GTTGAGAAAG GAAGTTCTCG 414 0 
ACACCAGAAA TGCATCGGAG GACCACAATC AGTT CTATGC TGCCAAAGAT TAAAAATAAA 4200 
TAAAAACATA AAAAATTAAG AGGGGCCAAG AGGAAGACAT TCTTTCTGCA AGGAAATTTC 4260 
TTTTAAATTC TGAACTGCTA CTACACACAA GTGAAAGTCA ACCCTATGTA AACTGGTGTC 4320 
CTCTCTCTAG CCCTCTCCCT TACTGGCCCA CTTCTCTCTC CGTAGAGAGC CTGAAAAACT 4380 
GCCCCAATGC CACGGTAAAG GCGAGGAAGT CTTGGCTGGC GTTGCTGACT CACAGTCGCC 444 0 
ATCCATCTGG ACACAAAGAG AGACCTGTGG GAGTCATAGA GGGTACTGTT AGCCCCGGTC 4 500 
CATGCAGGGG GTTCAGCCGA GCCCAAGACT CAAAGCTGCT TTCCTTTCAG GATTTGTAGT 4 560 
AACGTAAGGT GATAATGGCC AAAAGTGGTT CTCTCTCATT AAACCAACCA GTAAAAGCGT 4 620 
ATCCTATTTT TTTGCATAAG GTGTTTCATT TTCGTTTTTA TGGGAAACCA AGGGAAAAGC 4680 
ACATTGCGAT CCATTCAGTG TTTAACTGTC GTGGCTCATT TTCTGTTCGT TAGCACTTGT 4740 
GTGACAAAAG AGCTCAGATC CGACTTCTCC T ATGTGT C AC TTATTCCAAG AACCCAACTA 4800 
TGCCCTTAGG TAGAAAGATT TGACTCGTGT GTCTACTAGC CAACAGG C AG AGCAGGGTTG 4 860 
AAAAAAATAT CAGCTCCCAA AGGGCCCATG TGTCTACATC ATCAGTTACT GTCATGCACC 4920 
ACATTTGTGT GCAGATACCA AAAGAGGAGG AAAGAAGAAA AAAATTAATG TGTGGGAGCT 4980 
GCACGTTTAC ATGTTTTGAG CTATGCTTCA AACACAACTG GAAAGCCATC AATCTTCAAA 5040 
GGCCTCAAAA ATACTTTTAT AGTAACAAGT GCACGACTTT AGTTGGGTTA TTCAAGATGG 5100 
CACAAAAAGG TTTCCGCAGA GGTGGTATGC TGTGCTTTTG GCGCAAGTGG TGGGGGGATG 5160 
GGGGTGGGGG TGGAATTTTT TTCTCACTCT AATGACTTCC TATTGGAAAG GCATTGACAG 5220 
CCAGGGACAG GAGCCAGGGT GGGGGTAGTT TTGTGGGAAA GCAGAACTGA AGTTAGCTTA 5280 
AGCATAAAAA CAAAGAAAAA TCTTCGCTTT TCATGTATGT GGAATCCAAG AATAACCATA 5340 
GGCTCTACCA GACCAGGAGG GTAAGGATGG ACACTAAAAT GAAACAAATA CCAAGGTATT 5400 
CCTTCTGCTG CAGCCTGGAG ACCACCGAGA GTCGAGCTGG GGCACACACA CACCTGGCCG 5460 
GGACCCGGCA GGGACAAGGC GGGCCGTGGC CTCCTCCACC AAGTCTCTCT AGACAATTCA 5520 
GGGCCTGCTT TCCCCAGCTC CATGCATGGC TGGACTGGTG ATTCCAGGGT GCAGAAGGGA 5580 
TTCATATTCC CAGAACGCTT TAAGTGTACA CCTGCAGGAT AAAGAGATAC CGGTTACATT 5640 
ATTAAATGAT TCTAGGGATT CACTGGGGGA TATTTTTGTT GCTTTTACTT TCATGGTTAG 5700 
AGCTACAAAG AACAGTGATT TTTTTTTTTT CTCCCTTCCC CATTCAGAAA CATTATACAT 5760 
TGGGCCATTT TTCTTTCTCC CAAAGAAGAT TCATGGATAG TCAGACTGAA CTGTGTGCAA 5820 
CAGGAAAAGT CAAAAGGGAA AAGGCAGCTG ATGAGGTTAC ATGGTTACAT GTTCTACATC 5880 
ATGCAGAGTA GCTTGAAATC TAGTCTGGAG AAAACTGGAT CAAGATTCTA GCCCACTGGA 5940 
GTTGCAAGGA ATGAGAGGCA AAAAT TCTAA AGATTTGGGT TATATTTTCA ACTTGGGGGA 6000 
CAGAGAGAAA TGGAGAGCAG GAATTACAGT TCCAACAAAC ATCATGATAG TCTGGTAGTC 606 0 
AAGACAGAGA TTAAGTAAAA CAGGTTTTAC TGTTTAGCTG AGTTCAGTTA ATACAAAATG 6120 
TACATAAAAC GTTAGTCCTT TGAGACTGAC ATGATTAATG ATCAGTGTGG TGGGAAATGA 6180 
TGTAGTTATT GTACACAAGC ACTXGCAAAC TCTTTATCCC TATTT CTTTA AAACAAAATA 6240 
AGGTGAAATA CGAAGTCCTT GGTCTGATAT AAAGCCCCTA TTGGATTCTT CGGATGCGTA 6300 
AAAGAAATTG CCTGTTTCAG CCAGAAGACT GGTGAAAACA CAT ACAT C AG ACTATGTTGT 6360 
GAGCCAGGTT GATTTTTTAT TTTATTATAT GCAGGTGAGT GTTGAAACTG TTAAAATTCC 642 0 
AATTTGTTTT CATTCAGTAT TAGTTTAGTT CTAAATATAG CAAACCCCAT CCAGGTGCTA 6480 
TCAGATGACC AGTTACTGCT TAGTTAACTA GGTGTAAAGT TTTACATATA CATT AATTT C 654 0 
AATAGTTTAT TACAAGTTGT GTAAAATGGA CTCTAGTTTA ATAATGGGGG AAAAAAGATT 6600 
AGGTTG^CC TGAAACTGAC TGTAGAGCAT GTAAAATGAT TTTACTGGAT TCTGTTCAAC 6660 
TGTAAT\AT GAAAAAGATG TACGTTGTAG ACAAAGTTGC AGAATTAAAA AAAGAAATCT 6720 
GCTTTTAATT TATTCTTTTT GTATTAAGAA TTTGTATAGT AT CTTT ACAT TTTGCAAAAC 678 0 
AGTGTTGTCA ACACTTATTA AAGCATTTTC AAAATG 



ACG8 DNA sequence 

Gene name: ubiquitin E3 ligase SMURF2 
Unigene number: Hs. 21806 (3'UTR only) 
Probesec Accession #: AA398243 
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Nucleic Acid Accession #: AF301463 cluster 

Coding sequence: 9-2255 (predicted start/stop codons underlined) 

CCGGGGACATLGTCTAACCCC GGAGGCCGGA GGAACGGGCC CGTCAAGCTG CGCCTGACAG 60 

TACTCTGTGC AAAAAACCTG GTGAAAAAGG ATTTTTTCCG ACTTCCTGAT CCATTTGCTA 120 

AGGTGGTGGT TGATGGATCT GGGCAATGCC ATTCTACAGA TACTGTGAAG AATACGCTTG 180 

ATCCAAAGTG GAATCAGCAT TATGACCTGT ATATTGGAAA GTCTGATTCA GTTACGATCA 240 

GTGTATGGAA TCACAAGAAG ATCCATAAGA AACAAGGTGC TGGATTTCTC GGTTGTGTTC 300 

GTCTTCTTTC CAATGCCATC AACCGCCTCA AAGACACTGG TTATCAGAGG TTGGATTTAT 360 

GCAAACTCGG GCCAAATGAC AATGATACAG TTAGAGGACA GATAGTAGTA AGTCTTCAGT 420 

CCAGAGACCG AATAGGCACA GGAGGACAAG TTGTGGACTG CAGTCGTTTA TTTGATAACG 480 

ATTTACCAGA CGGCTGGGAA GAAAGGAGAA CCGCCTCTGG AAGAATCCAG TATCTAAACC 54 0 

ATATAACAAG AACTACGCAA TGGGAGCGCC CAACACGACC GGCATCCGAA TATTCTAGCC 600 

CTGGCAGACC TCTTAGCTGC TTTGTTGATG AGAACACTCC AATTAGTGGA ACAAATGGTG 660 

CAACATGTGG ACAGTCTTCA GATCCCAGGC TGGCAGAGAG GAGAGTCAGG TCACAACGAC 720 

ATAGAAATTA CATGAGCAGA ACACATTTAC ATACTCCTCC AGACCTACCA GAAGGCTATG 780 

AACAGAGGAC AACGCAACAA GGCCAGGTGT ATTTCTTACA TACACAGACT GGTGTGAGCA 840 

CATGGCATGA TCCAAGAGTG CCCAGGGATC TTAGCAACAT CAATTGTGAA GAGCTTGGTC 900 

CGTTGCCTCC TGGATGGGAG ATCCGTAATA CGGCAACAGG CAGAGTTTAT TTCGTTGACC 960 

ATAACAACAG AACAACACAA TTTACAGATC CTCGGCTGTC TGCTAACTTG CATTTAGTTT 1020 

TAAAT CGGCA GAACCAATTG AAAGACCAAC AGCAACAGCA AGTGGTATCG TTATGTCCTG 1080 

ATGACACAGA ATGCCTGACA GTCCCAAGGT ACAAGCGAGA CCTGGTTCAG AAACTAAAAA 1140 

TTTTGCGGCA AG AACTTT C C CAACAACAGC CTCAGGCAGG TCATTGCCGC ATTGAGGTTT 1200 

CCAGGGAAGA GATTTTTGAG GAATCAT AT C GACAGGTCAT GAAAATGAGA CCAAAAGATC 1260 

TCTGGAAGCG ATTAATGATA AAATTTCGTG GAGAAGAAGG CCTTGACTAT GGAGGCGTTG 1320 

CCAGGGAATG GTTGTATCTC TTGTCACATG AAATGTTGAA T CCAT ACT AT GGCCTCTTCC 13 80 

AGTATTCAAG AGATGATATT TATACATTGC AG AT CAAT CC TGATTCTGCA GTTAATCCGG 1440 

AACATT T AT C CTATTTCCAC TTTGTTGGAC GAATAATGGG AATGGCTGTG TTTCATGGAC 1500 

ATTATATTGA TGGTGGTTTC ACATTGCCTT TTTATAAGCA ATTGCTTGGG AAGTCAATTA 1560 

CCTTGGATGA CATGGAGTTA GTAGATCCGG ATCTTCACAA CAGTTTAGTG TGGATACTTG 1620 

AGAATGATAT TACAGGTGTT TTGGACCATA CCTTCTGTGT TGAACATAAT GCATATGGTG 1680 

AAATTATTCA GCATGAACTT AAACCAAATG G C AAAAGT AT CCCTGTTAAT GAAGAAAATA 174 0 

AAAAAGAATA TGTCAGGCTC TATGTGAACT GGAGATTTTT ACGAGGCATT GAGGCTCAAT 1800 

TCTTGGCTCT GCAGAAAGGA TTTAATGAAG TAATTCCACA ACATCTGCTG AAGACATTTG 1860 

ATGAGAAGGA GTTAGAGCTC ATTATTTGTG GACTTGGAAA GATAGATGTT AATGACTGGA 1920 

AGGTAAACAC CCGGTTAAAA CACTGTACAC CAGACAGCAA CATTGTCAAA TGGTTCTGGA 1980 

AAGCTGTGGA GTTTTTTGAT GAAGAGCGAC GAGCAAGATT GCTTCAGTTT GTGACAGGAT 2040 

CCTCTCGAGT GCCTCTGCAG GGCTTCAAAG CATTGCAAGG TGCTGCAGGC CCGAGACTCT 2100 

TTACCATACA CCAGATTGAT GCCTGCACTA ACAACCTGCC GAAAGCCCAC ACTTGCTTCA 2160 

ATCGAATAGA CATTCCACCC TATGAAAGCT ATGAAAAGCT ATATGAAAAG CTGCTAACAG 2220 
CCATTGAAGA AACATGTGGA TTTGCTGTGG AATGACAAGC TTCAAGGATT TACCCAGGAC 



ACH1 DNA sequence 

Gene name: EST 

Unigene number: Hs. 30089 

Probeset Accession #: AA410480 

CAT cluster**: 96816_1 >. 

Coding sequence: Partial sequence, possible frameshift. Predicted stop codon 
underlined. 

CTCCACTATG GACAGAGCCT CCACTGAGCT GCTGCCTGCC CGCCACATAC CCAG CTGAC A 60 

GGGGCCCCGC AGAGCCATGC AGCTGTGCTG GGGTGATCCT GGGCTTCCTC CTGTTC CGAG 120 

GCCACAACTC CCAGCCCACA ATGACCCAGA CCTCTAGCTC TCAGGGAGGC CTTGGCGGTC 18 0 

TAAGTCTGAC CACAGAGCCA GTTTCTTCCA ACCCAGGATA CATCCCTTCC TCAGAGGCTA 240 

ACAGG CCAAG CCATCTGTCC AGCACTGGTA CCCCAGGCGC AGGTGTCCCC AGCAGTGGAA 300 

GAGACGGAGG CACAAGCAGA GACACATTTC AAACTGTTCC CCCCAATTCA ACCACCATGA 3 60 

GCCTGAGCAT GAGGGAAGAT GCGACCATCC TGCCCAGCCC CACGTCAGAG ACTGTGCTCA 420 

CTGTGGCTGC ATTTGGTGTT ATCAGCTTCA TTGTCATCCT GGTGGTTGTG GTGATCATCC 480 

TAGTTGGTGT GGTCAGCCTG AGGTTC^.aGT GTCGGAAGAG CAAGGAGTCT GGAGATCCCC 54 0 

AGAAACCTGG AGAGCGGGAG GAGAAGGTGG GACATAGGAG GGAACCCTAC CCCTGGAATT 600 

GACTTGGACT CTGGGTCTGG AAACGCAAGT TCAAATCTCA CCCATTTGTT CCAGGAGGTT 660 

CTGGCTGATG AGGAAGACCC TTGTGGGAGG GGGGCCCCTG CCCTCCAGTT AGCTCTTCTT 72 0 

GGCTGTGCTG GGTTCCATGT TCTCATGCAG GGATGGAGTC GGGTGGAGAG CCCACTCTGG 78 0 

CTAGGGGGCG GCAGGCTGAG AGCTCACCTG TTCAGCAGAG AAGTGGAACT CACTTTGCTC 84 0 

CTGGAGCCTC CCTACACAGT ACTTATCTGG GAAGGGAATG CCGGACTCTT GTTGGCCCCT 900 
TTGTCCCCCC GACTGGCCCC CTTCGCCG 
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AC J 2 DNA sequence 

Gene name: Complement component Clq receptor 
Unigene number: Hs. 97199 
Probeset Accession #: AA487558 
Nucleic Acid Accession #: NM_012072 

Coding sequence: 149-2107. Predicted start/stop codons underlined 

AAAGCCCTCA GCCTTTGTGT CCTTCTCTGC GCCGGAGTGG CTGCAGCTCA CCCCTCAGCT 60 

CCCCTTGGGG CCCAGCTGGG AGCCGAGATA GAAGCTCCTG TCGCCGCTGG GCTTCTCGCC 120 

TCCCGCAGAG GGCCACACAG AGACCGGGAT_GGCCACCTCC ATGGGCCTGC TGCTGCTGCT 180 

GCTGCTGCTC CTGACCCAGC CCGGGGCGGG GACGGGAGCT GACACGGAGG CGGTGGTCTG 240 

CGTGGGGACC GCCTGCTACA CGGCCCACTC GGGCAAGCTG AGCGCTGCCG AGGCCCAGAA 300 

CCACTGCAAC CAGAACGGGG GCAACCTGGC CACTGTGAAG AGCAAGGAGG AGGCCCAGCA 360 

CGTCCAGCGA GTACTGGCCC AGCTCCTGAG GCGGGAGGCA GCCCTGACGG CGAGGATGAG 420 

CAAGTTCTGG ATTGGGCTCC AGCGAGAGAA GGGCAAGTGC CTGGACCCTA GTCTGCCGCT 48 0 

GAAGGGCTTC AGCTGGGTGG GCGGGGGGGA GGACACGCCT TACTCTAACT GGCACAAGGA 540 

GCTCCGGAAC TCGTGCATCT CCAAGCGCTG TGTGTCTCTG CTGCTGGACC TGTCCCAGCC 600 

GCTCCTTCCC AACCGCCTGC CCAAGTGGTC TGAGGGCCCC TGTGGGAGCC CAGGCTCCCC 660 

CGGAAGTAAC ATTGAGGGCT TCGTGTGCAA GTTCAGCTTC AAAGGCATGT GCCGGCCTCT 720 

GGCCCTGGGG GGCCCAGGTC AGGTGACCTA CACCACCCCC TTCCAGACCA CCAGTTCCTC 780 

CTTGGAGGCT GTGC CCTTTG CCTCTGCGGC CAATGTAGCC TGTGGGGAAG GTGACAAGGA 840 

CGAGACTCAG AGTCATTATT TCCTGTGCAA GGAGAAGGCC CCCGATGTGT TCGACTGGGG 900 

CAGCTCGGGC CCCCTCTGTG TCAGCCCCAA GTATGGCTGC AACTTCAACA ATGGGGGCTG 960 

CCACCAGGAC TGCTTTGAAG GGG^GGATGG CTCCTTCCTC TGCGGCTGCC GACCAGGATT 1020 

CCGGCTGCTG GATGACCTGG TGACCTGTGC CTCTCGAAAC CCTTGCAGCT CCAGCCCATG 1080 

TCGTGGGGGG GCCACGTGCG TCCTGGGACC CCAT GGGAAA AACTACACGT GCCGCTGCCC 1140 

CCAAGGGTAC CAGCTGGACT CGAGTCAGCT GGACTGTGTG GACGTGGATG AATGCCAGGA 1200 

CTCCCCCTGT GCCCAGGAGT GTGTCAACAC CCCTGGGGGC TTCCGCTGCG AATGCTGGGT 1260 

TGGCTATGAG CCGGGCGGTC CTGGAGAGGG GGCCTGTCAG GATGTGGATG AGTGTGCTCT '1320 

GGGTCGCTCG CCTTGCGCCC AGGGCTGCAC CAACACAGAT GGCTCATTTC ACTGCTCCTG 1380 

TGAGGAGGGC TACGTCCTGG CCGGGGAGGA CGGGACTCAG TGCCAGGACG TGGATGAGTG 1440 

TGTGGGCCCG GGGGGCCCCC TCTGCGACAG CTTGTGCTTC AACACACAAG GGTCCTTCCA 1500 

CTGTGGCTGC CTGCCAGGCT GGGTGCTGGC CCCAAATGGG GTCTCTTGCA CCATGGGGCC 1560 

TGTGTCTCTG GGACCACCAT CTGGGCCCCC CGATGAGGAG GACAAAGGAG AGAAAGAAGG 1620 

GAGCACCGTG CCCCGCGCTG CAACAGCCAG TCCCACAAGG GGCCCCGAGG GCACCCCCAA 1680 

GGCTACACCC ACCACAAGTA GACCTTCGCT GTCATCTGAC GCCCCCATCA CATCTGCCCC 1740 

ACTCAAGATG CTGGCCCCCA GTGGGTCCTC AGGCGTCTGG AGGGAGCCCA GCATCCATCA 1800 

CGCCACAGCT GCCTCTGGCC CCCAGGAGCC TGCAGGTGGG GACTCCTCCG TGGCCACACA 1860 

AAACAACGAT GGCACTGACG GGCAAAAGCT GCTTTTATTC TACATCCTAG GCACCGTGGT 1920 

GGCCATCCTA CTCCTGCTGG CCCTGGCTCT GGGGCTACTG GTCTATCGCA AGCGGAGAGC 1980 

GAAGAGGGAG GAGAAGAAGG AGAAGAAGCC CCAGAATGCG GCAGACAGTT ACTCCTGGGT 2040 

TCCAGAGCGA GCTGAGAGCA GGGCCATGGA GAACCAGTAC AGTCCGACAC CTGGGACAGA 2100 

CTGCTGAAAG TGAGGTGGCC CTAGAGACAC TAGAGTCACC AGCCACCATC CTCAGAGCTT 2160 

TGAACTCCCC ATTCCAAAGG GGCACCCACA TTTTTTTGAA AGACTGGACT GGAATCTTAG 2220 

CAAACAATTG TAAGTCTCCT CCTTAAAGGC CCCTTGGAAC ATGCAGGTAT TTTCTACGGG 2280 

TGTTTGATGT TCCTGAAGTG GAAG CTGTGT GTTGGCGTGC CACGGTGGGG ATTTCGTGAC 234 0 

TCTATAATGA TTGTTACTCC CCCTCCCTTT TCAAATTCCA ATGTGACCAA TTCCGGATCA 24 00 

GGGTGTGAGG AGGCTGGGGC TAAGGGGCTC CCCTGAATAT CTTCTCTGCT CACTTCCACC 2460 

ATCTAAGAGG AAAAGGTGAG TTGCTCATGC TGATTAGGAT TGAAATGATT TGTTTCTCTT 2520 

CCTAGGATGA AAACTAAATC AATTAATTAT TCAATTAGGT AAGAAGATCT GGTTTTTTGG 2580 

TCAAAGGGAA CATGTTCGGA CTGGAAACAT TTCTTTACAT TTGCATTCCT CCATTTCGCC 2640 

AGCACAAGTC TTGCTAAATG TGAT AC TGTT GACATCCTCC AGAATGGCCA GAAGTGCAAT 2700 

TAACCTCTTA GGTGGCAAGG AGGCAGGAAG TGCCTCTTTA GT T C TT AC AT TTCTAATAGC 2760 

CTTGGGTTTA TTTGCAAAGG AAGCTTGAAA AATATGAGAA AAGTTGCTTG AAGTGCATTA 2820 

CAGGTGTTTG TGAAGT CAC A TAATCTACGG GGCTAGGGCG AGAGAGGCCA GGGATTTGTT 288 0 

CACAGATACT TGAATTAATT CAT CC AAATG TACTGAGGTT ACCACACACT TGACTACGGA 2940 

TGTGATCAAC ACTAACAAGG AAACAAATTC AAGGACAACC TGTCTTTGAG CCAGGGCAGG 3 000 

CCTCAGACAC CCTGCCTGTG GCCCCGCCTC CACTTCATCC TGCCCGGAAT GCCAGTGCTC 3 060 

CGAGCT CAGA CAGAGGAAGC CCTGCAGAAA GTTCCATCAG GCTGTTT ~T AAAGGATGTG 3120 

TGAACGGGAG ATGATGCACT GTGTTTTGAA AGTTGTCATT TTAAAGCATT TTAGCACAGT 318 0 

TCATAGTCCA CAGTTGATGC AGCATCCTGA GATTTTAAAT CCTGAAGTGT GGGTGGCGCA 3240 

CACACCAAGT AGGGAGCTAG TCAGGCAGTT TGCTTAAGGA ACTTTTGTTC TCTGTCTCTT 33 00 

TTCCTTAAAA TTGGGGGTAA GGAGGGAAGG AAGAGGGAAA GAGATGACTA ACT AAAAT CA 33 60 

TTTTTACAGC AAAAACTGCT CAAAGCCATT TAAATTATAT CCTCATTTTA AAAGTTACAT 342 0 

TTGCAAATAT TTCTCCCTAT GATAATGCAG TCGATAGTGT GCACTCTTTC TCTCTCTCTC 34 80 

TCTCTCTCAC ACACACACAC ACACACACAC ACACACACAC AGAGACACGG CACCATTCTG 354 0 

CCTGGGGCAC TGGAACACAT TCCTGGGGGT CACCGATGGT CAGAGTCACT AGAAGTTACC 3600 
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TGAGTATCTC TGGGAGGCCT CATGTCTCCT GTGGGCTTTT TACCACCACT GTGCAGGAGA 3660 

ACAGACAGAG GAAATGTGTC TCCCTCCAAG GCCCCAAAGC CTCAGAGAAA GGGTGTTTCT 3720 

GGTTTTGCCT TAGCAATGCA TCGGTCTCTG AGGTGACACT CTGGAGTGGT TGAAGGGCCA 3780 

CAAGGTGCAG GGTTAATACT CTTGCCAGTT TTGAAATATA GATGCTATGG TTCAGATTGT 3840 

TTTTAATAGA AAACTAAAGG GGCAGGGGAA GTGAAAGGAA AGATGGAGGT TTTGTGCGGC 3900 

TCGATGGGGC ATTTGGAACT TCTTTTTAAA GTCATCTCAT GGTCTCCAGT TTTCAGTTGG 3960 

AACTCTGGTG TTTAACACTT AAGGGAGACA AAGGCTGTGT CCATTTGGCA AAACTTCCTT 4020 

GGCCACGAGA CTCTAGGTGA TGTGTGAAGC TGGGCAGTCT GTGGTGTGGA GAGCAGCCAT 4080 

CTGTCTGGCC ATTCAGAGGA TTCTAAAGAC ATGGCTGGAT GCGCTGCTGA CCAACATCAG 414 0 

CACTTAAATA AATGCAAATG CAACATTTCT CCCTCTGGGC CTTGAAAATC CTTGCCCTTA 4200 

TCATTTGGGG TGAAGGAGAC ATTTCTGTCC TTGGCTTCCC ACAGCCCCAA CGCAGTCTGT 4260 

GTATGATTCC TGGGATCCAA CGAGCCCTCC TATTTTCACA GTGTTCTGAT TGCTCTCACA 4320 

GCCCAGGCCC ATCGTCTGTT CTCTGAATGC AGCCCTGTTC TCAACAACAG GGAGGTCATG 43 80 

GAACCCCTCT GTGGAACCCA CAAGGGGAGA AATGGGTGAT AAAGAATCCA GTTCCTCAAA 4440 

ACCTTCCCTG GCAGGCTGGG TCCCTCTCCT GCTGGGTGGT GCTTTCTCTT GCACACCACT 4500 

CCCACCACGG GGGGAGAGCC AGCAACCCAA CCAGACAGCT CAGGTTGTGC ATCTGATGGA 4560 

AACCACTGGG CTCAAACACG TGCTTTATTC TCCTGTTTAT TTTTGCTGTT ACTTTGAAGC 4620 

ATGGAAATTC TTGTTTGGGG GATCTTGGGG CTACAGTAGT GGGTAAACAA ATGCCCACCG 468 0 

GCCAAGAGGC CATTAACAAA TCGTCCTTGT CCTGAGGGGC CCCAGCTTGC TCGGGCGTGG 4740 

CACAGTGGGG AATCCAAGGG TCACAGTATG GGGAGAGGTG CACCCTGCCA CCTGCTAACT 4800 

TCTCGCTAGA CACAGTGTTT CTGCCCAGGT GACCTGTTCA GCAGCAGAAC AAGCCAGGGC 4860 

CATGGGGACG GGGGAAGTTT TCACTTGGAG ATGGACACCA AGACAATGAA GATTTGTTGT 4920 

CCAAATAGGT CAATAATTCT GGGAGACTCT TGGAAAAAAC TGAATATATT CAGGACCAAC 4 980 

TCTCTCCCTC CCCTCATCCC ACATCTCAAA GCAGACAATG TAAAGAGAGA ACATCTCACA 5040 

CACCCAGCTC GCCATGCCTA CTCATTCCTG AATTTCAGGT GCCATCACTG CTCTTTCTTT 5100 

CTTCTTTGTC ATTTGAGAAA GGATGCAGGA GGACAATTCC CACAGATAAT CTGAGGAATG 5160 

CAGAAAAACC AGGGCAGGAC AGTTATCGAC AATGCATTAG AACTTGGTGA GCATCCTCTG 5220 

TAGAGGGACT CCACCCCTGC TCAACAGCTT GGCTTCCAGG CAAGACCAAC CACATCTGGT 528 0 

CTCTGCCTTC GGTGGCCCAC ACACCTAAGC GTCATCGTCA TTGCCATAGC ATCATGATGC 5340 

AACACATCTA CGTGTAGCAC TACGACGTTA TGTT TGGGTA ATGT GGGGAT GAACTGCATG '5400 

AGGCTCTGAT TAAGGATGTG GGGAAGTGGG CTGCGGTCAC TGTCGGCCTT GCAAGGCCAC 5460 

CTGGAGGCCT GTCTGTTAGC CAGTGGTGGA GGAGCAAGGC TTCAGGAAGG GCCAGCCACA 5520 

TGCCATCTTC CCTGCGATCA GGCAAAAAAG TGGAATTAAA AAGTCAAACC TTTATATGCA 5580 

TGTGTTATGT CCATTTTGCA GGATGAACTG AGTTTAAAAG AATTTTTTTT TCTCTTCAAG 5640 

TTGCTTTGTC TTTTCCATCC TCATCACAAG CCCTTGTTTG AGTGTCTTAT CCCTGAGCAA 5700 

TCTTTCGATG GATGGAGATG ATCATTAGGT ACTTTTGTTT CAACCTTTAT TCCTGTAAAT 5760 

ATTTCTGTGA AAACTAGGAG AACAGAGATG AGATTTGACA AAAAAAAATT GAATTAAAAA 5820 

TAACACAGTC TTTTTAAAAC TAACATAGGA AAGCCTTTCC TATTATTTCT CTTCTTAGCT 5880 

TCTCCATTGT CTAAATCAGG AAAACAGGAA AACACAGCTT TCTAGCAGCT GCAAAATGGT 5940 

TTAATGCCCC CTACATATTT CCATCACCTT GAACAATAGC TTTAGCTTGG GAAT CTGAGA 6000 

TATGATCCCA GAAAACATCT GTCTCTACTT CGGCTGCAAA ACCCATGGTT TAAATCTATA 6060 

TGGTTTGTGC ATTTTCTCAA CTAAAAATAG AGATGATAAT CCGAATTCTC CATATATTCA 6120 

CTAATCAAAG ACACTATTTT CATACTAGAT TCCTGAGACA AATACTCACT GAAGGGCTTG 6180 

TTTAAAAATA AATTGTGTTT TGGTCTGTTC TTGTAGATAA TGCCCTTCTA TTTTAGGTAG 6240 

AAGCTCTGGA ATCCCTTTAT TGTGCTGTTG CTCTTATCTG CAAGGTGGCA AGCAGTTCTT 63 00 

TTCAGCAGAT TTTGCCCACT ATTCCTCTGA GCTGAAGTTC TTTGCATAGA TTTGGCTTAA 6360 

GCTTGAATTA GATCCCTGCA AAGGCTTGCT CTGTGATGTC AGATGTAATT GTAAATGTCA 6420 

GTAATCACTT CATGAATGCT A^ATGAGAAT GTAAGTATTT TTAAATGTGT GTATTTCAAA 6480 

TTTGTTTGAC TAATTCTGGA ATTACAAGAT TTCTATGCAG GATTTACCTT CATCCTGTGC 6540 

ATGTTTCCCA AACTGTGAGG AGGGAAGGCT CAGAGATCGA GCTTCTCCTC TGAGTTCTAA 6600 

CAAAATGGTG CTTTGAGGGT CAGCCTTTAG GAAGGTGCAG CTTTGTTGTC CTTTGAGCTT 6660 
TCTGTTATGT GCCTATCCTA ATAAACTCTT AAACACATT 



AC J 3 DNA sequence 

Gene name: FLTl/vascular endothelial growth factor receptor 
Unigene number: Hs. 13 86 71 
Probeset Accession #: AA047437 
Nucleic Acid Accession #: NM_002019 

Coding sequence: 250-4266 (predicted start/stop codons underlined) 

GCGGACACTC CTCTCGGCTC CTCCCCGGCA GCGGCGGCGG CTCGGAGCGG GCTCCGGGGC 60 

TCGGGTGCAG CGGCCAGCGG GCCTGGCGGC GAGGATTACC CGGGGAAGTG GTTGTCTCCT 12 0 

GGCTGGAGCC GCGAGACGGG CGCTCAGGGC GCGGGGCCGG CGGCGGCGAA CGAGAGGACG 18 0 

GACTCTGGCG GCCGGGTCGT TGGCCGGGGG AGCGCGGGCA CCGGGCGAGC AGGCCGCGTC 240 

GCGCTCACCA_TGGTCAGCTA CTGGGACACC GGGGTCCTGC TGTGCGCGCT GCTCAGCTGT 3 00 

CTGCTTCTCA CAGGATCTAG TTCAGGTTCA AAATTAAAAG ATCCTGAACT GAGTTTAAAA 3 60 

GGCACCCAGC ACATCATGCA AGCAGGCCAG ACACTGCATC TCCAATGCAG GGGGGAAGCA 420 
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GCCCATAAAT GGTCTTTGCC TGAAATGGTG AGTAAGGAAA GCGAAAGGCT GAGCATAACT 480 

AAATCTGCCT GTGGAAGAAA TGGCAAACAA TTCTGCAGTA CTTTAACCTT GAACACAGCT 54 0 

CAAGCAAACC ACACTGGCTT CTACAGCTGC AAATATCTAG CTGTACCTAC TTCAAAGAAG 600 

AAGGAAACAG AATCTGCAAT CTATATATTT ATTAGTGATA CAGGTAGACC TTTCGTAGAG 660 

ATGTACAGTG AAATCCCCGA AATTATACAC ATGACTGAAG GAAGGGAGCT CGTCATTCCC 720 

TGCCGGGTTA CGTCACCTAA CATCACTGTT ACTTTAAAAA AGTTTCCACT TGACACTTTG 780 

ATCCCTGATG GAAAACGCAT AATCTGGGAC AGTAGAAAGG GCTTCATCAT ATCAAATGCA 840 

ACGTACAAAG AAATAGGGCT TCTGACCTGT GAAGCAACAG TCAATGGGCA TTTGTATAAG 900 

ACAAACTATC TCACACATCG ACAAACCAAT ACAATCATAG ATGTCCAAAT AAGCACACCA 960 

CGCCCAGTCA AATTACTTAG AGGCCATACT CTTGTCCTCA ATTGTACTGC TACCACTCCC 1020 

TTGAACACGA GAGTTCAAAT GACCTGGAGT TACCCTGATG AAAAAAATAA GAGAGCTTCC 1080 

GTAAGGCGAC GAATTGACCA AAGCAATTCC CATGCCAACA TATTCTACAG TGTTCTTACT 114 0 

ATTGACAAAA TGCAGAACAA AGACAAAGGA CTTTATACTT GTCGTGTAAG GAGTGGACCA 1200 

TCATTCAAAT CTGTTAACAC CTCAGTGCAT ATATATGATA AAGCATTCAT CACTGTGAAA 1260 

CATCGAAAAC AGCAGGTGCT TGAAACCGTA GCTGGCAAGC GGTCTTACCG GCTCTCTATG 1320 

AAAGTGAAGG CATTTCCCTC GCCGGAAGTT GTATGGTTAA AAGATGGGTT ACCTGCGACT 1380 

G AG AAATCT G CTCGCTATTT GACTCGTGGC TACTCGTTAA TTAT CAAGGA CGTAACTGAA 1440 

GAGGATGCAG GGAATTATAC AATCTTGCTG AGCATAAAAC AGTCAAATGT GTTTAAAAAC 1500 

CTCACTGCCA CTCTAATTGT CAATGTGAAA CCCCAGATTT ACGAAAAGGC CGTGTCATCG 1560 

TTT CCAGACC CGGCTCTCTA CCCACTGGGC AGCAGACAAA TCCTGACTTG T AC CGCATAT 1620 

GGTATCCCTC AACCTACAAT CAAGTGGTTC TGGCACCCCT GTAACCATAA TCATTCCGAA 1680 

GCAAGGTGTG ACTTTTGTTC CAATAATGAA GAGTCCTTTA TCCTGGATGC TGACAG CAAC 1740 

ATGGGAAACA GAATTGAGAG CAT C ACT C AG CGCATGGCAA TAATAGAAGG AAAGAATAAG 1800 

ATGGCTAGCA CCTTGGTTGT GGCTGACTCT AGAATTTCTG GAATCTACAT TTGCATAGCT 1860 

TCCAATAAAG TTGGGACTGT GGGAAGAAAC ATAAGCTTTT ATATCACAGA TGTGCCAAAT 1920 

GGGTTTCATG TTAACT TGGA AAAAATGCCG AC GGAAGGAG AGGAC CTGAA ACTGTCTTGC 1980 

ACAGTTAACA AGTTCTTATA CAGAGACGTT ACTTGGATTT TACTGCGGAC AGTTAATAAC 204 0 

AGAACAATGC ACTACAGTAT TAGCAAGCAA AAAATGGCCA TCACTAAGGA GCACTCCATC 2100 

ACTCTTAATC TTACCATCAT GAATGTTTCC CTGCAAGATT CAGGCACCTA TGCCTGCAGA 2160 

GCCAGGAATG TATACACAGG GGAAGAAATC CTCCAGAAGA AAGAAATTAC AATCAGAGAT '2220 

CAGGAAGCAC CATACCTCCT GCGAAACCTC AGTGATCACA CAGTGGCCAT CAGCAGTTCC 2280 

ACCACTTTAG ACTGTCATGC TAATGGTGTC CCCGAGCCTC AGATCACTTG GTTTAAAAAC 234 0 

AACCACAAAA TACAACAAGA GCCTGGAATT ATTTTAGGAC CAGGAAGCAG CACGCTGTTT 2400 

ATTGAAAGAG TCACAGAAGA GGATGAAGGT GTCTATCACT GCAAAGCCAC CAACCAGAAG 2460 

GGCTCTGTGG AAAGTTCAGC ATACCTCACT GTTCAAGGAA CCTCGGACAA GTCTAATCTG 2520 

GAGCTGATCA CTCTAACATG CACCTGTGTG GCTGCGACTC TCTTCTGGCT CCTATTAACC 2580 

CTCCTTATCC GAAAAATGAA AAGGTCTTCT TCTGAAATAA AGACTGACTA CCTATCAATT 2640 

ATAATGGACC CAGATGAAGT TCCTTTGGAT GAGCAGTGTG AGCGGCTCCC TTATGATGCC 2700 

AGCAAGTGGG AGTTTGCCCG GGAGAGACTT AAACTGGGCA AATCACTTGG AAGAGGGGCT 2 760 

TTTGGAAAAG TGGTTCAAGC ATCAGCATTT GGCATTAAGA AATCACCTAC GTGCCGGACT 2820 

GTGGCTGTGA AAATGCTGAA AGAGGGGGCC ACGGCCAGCG AGTACAAAGC TCTGATGACT 2880 

GAGCTAAAAA TCTTGACCCA CATTGGCCAC CATCTGAACG TGGTTAACCT GCTGGGAGCC 2940 

TGCACCAAGC AAGGAGGGCC TCTGATGGTG ATTGTTGAAT ACTGCAAATA TGGAAATCTC 3000 

TCCAACTACC TCAAGAGCAA ACGTGACTTA TTTTTTCTCA ACAAGGATGC AGCACTACAC 3060 

ATGGAGCCTA AGAAAGAAAA AATGGAGCCA GGCCTGGAAC AAGGCAAGAA ACCAAGACTA 3120 

GATAGCGTCA CCAGCAGCGA AAGCTTTGCG AGCTCCGGCT TTCAGGAAGA TAAAAGTCTG 3180 

AGTGATGTTG AGGAAGAGGA GGATTCTGAC GGTTTCTACA AGGAGCCCAT CACTATGGAA 3240 

GATCTGATTT CTTACAGTTT TCAAGTGGCC AGAGGC AT GG AGTT CCTGTC TTCCAGAAAG 3300 

TGCATTCATC GGGACCTGGC AGCGAGAAAC ATTCTTTTAT CTGAGAACAA CGTGGTGAAG 3360 

ATTTGTGATT TTGGCCTTGC CCGGGATATT TATAAGAACC CCGATTATGT GAGAAAAGGA 3420 

GATACTCGAC TTCCT CTGAA ATGGATGGCT CCCGAATCTA TCTTTGACAA AATCTACAGC 3480 

ACCAAGAGCG ACGTGTGGTC TTACGGAGTA TTGCTGTGGG AAATCTTCTC CTTAGGTGGG 3 540 

TCTCCATACC CAGGAGTACA AATGGATGAG GACTTTTGCA GTCGCCTGAG GGAAGGCATG 3600 

AGGATGAGAG CTCCTGAGTA CTCTACTCCT GAAATCTATC AGATCATGCT GGACTGCTGG 3660 

CACAGAGACC CAAAAGAAAG GCCAAGATTT GCAGAACTTG TGGAAAAACT AGGTGATTTG 3720 

CTTCAAGCAA ATGTACAACA GGATGGTAAA GACTACATCC CAATCAATGC CATACTGACA 3780 

GGAAATAGTG GGTTTACATA CTCAACTCCT GCCTTCTCTG AGGACTTCTT CAAGGAAAGT 384 0 

ATTTCAGCTC CGAAGTTTAA TTCAGGAAGC TCTGATGATG TCAGATATGT AAATGCTTTC 3900 

AAGTTCATGA GCCTGGAAAG AATCAAAACC TTTGAAGAAC TTTTACCGAA TGCCACCTCC 3960 

ATGTTTGATG ACtSjCAGGG CGACAGCAGC ACTCTGTTGG CCTCTCCCAT GCTGAAGCGC 4020 

TTCACCTGGA CTGACAGCAA ACCCAAGGCC TCGCTCAAGA TTGACTTGAG AGTAACCAGT 408 0 

AAAAGTAAGG AGTCGGGGCT GTCTGATGTC AGCAGGCCCA GTTTCTGCCA TTCCAGCTGT 414 0 

GGGCACGT C A GCGAAGGCAA GCGCAGGTTC ACCTACGACC ACGCTGAGCT GGAAAGGAAA 42 00 

ATCGCGTGCT GCTCCCCGCC CCCAGACTAC AACTCGGTGG TCCTGTACTC CACCCCACCC 4260 
AT CTAGAGTT TGACACGAAG CCTTATTTCT AGAAGCACAT GTGTATTTAT ACCCCCAGG'A 43 2 0 
AACTAGCTTT TGCCAGTATT ATGCATATAT AAGTTTACAC CTTTATCTTT CCATGGGAGC 4380 
CAGCTGCTTT TTGTGATTTT TTTAATAGTG CTTTTTTTTT TTGACTAACA AGAATGTAAC 444 0 
TCCAGATAGA GAAATAGTGA CAAGTGAAGA AC ACT ACT GC TAAATCCTCA TGTTACTCAG 4 500 
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TGTTAGAGAA ATCCTTCCTA AACCCAATGA CTTCCCTGCT CCAACCCCCG CCACCTCAGG 4560 

GCACGCAGGA CCAGTTTGAT TGAGGAGCTG CACTGATCAC CCAATGCATC ACGTACCCCA 4620 

CTGGGCCAGC CCTGCAGCCC AAAACCCAGG GCAACAAGCC CGTTAGCCCC AGGGGATCAG 4680 

TGGCTGGCCT GAGCAACATC TCGGGAGTCC TCTAGCAGGC CTAAGACATG TGAGGAGGAA 4740 

AAGGAAAAAA AGCAAAAAGC AAGGGAGAAA AGAGAAACCG GGAGAAGGCA TGAGAAAGAA 4800 

TTTGAGACGC ACCATGTGGG CACGGAGGGG GACGGGGCTC AGCAATGCCA TTTCAGTGGC 4860 

TTCCCAGCTC TGACCCTTCT ACATTTGAGG GCCCAGCCAG GAGCAGATGG ACAGCGATGA 4920 

GGGGACATTT TCTGGATTCT GGGAGGCAAG AAAAGGACAA ATATCTTTTT TGGAACTAAA 4 980 

GCAAATTTTA GACCTTTACC TATGGAAGTG GTTCTATGTC CATTCTCATT CGTGGCATGT 5040 

TTTGATTTGT AGCACTGAGG GTGGCACTCA ACTCTGAGCC CATACTTTTG GCTCCTCTAG 5100 

TAAGATGCAC TGAAAACTTA GCCAGAGTTA GGTTGTCTCC AGGCCATGAT GGCCTTACAC 5160 

TGAAAATGTC ACATTCTATT TTGGGTATTA ATATATAGTC CAGACACTTA ACTCAATTTC 5220 

TTGGTATTAT TCTGTTTTGC ACAGTTAGTT GTGAAAGAAA GCTGAGAAGA ATGAAAATGC 5280 

AGTCCTGAGG AGAGTTTTCT CCAT AT C AAA ACGAGGGCTG ATGGAGGAAA AAGGTCAATA 5340 

AGGTCAAGGG AAGACCCCGT CTCTATACCA ACCAAACCAA TTCACCAACA CAGTTGGGAC 5400 

CCAAAACACA GGAAGTCAGT CACGTTTCCT TTTCATTTAA TGGGGATTCC ACTATCTCAC 5460 

ACTAATCTGA AAGGATGTGG AAGAGCATTA GCTGGCGCAT ATTAAGCACT TTAAGCTCCT 5520 

TGAGTAAAAA GGTGGTATGT AATTTATGCA AGGTATTTCT CCAGTTGGGA CTCAGGATAT 5580 

TAGTTAATGA GCCATCACTA GAAGAAAAGC CCATTTTCAA CTGCTTTGAA ACTTGCCTGG 5640 

GGTCTGAGCA TGATGGGAAT AGGGAGACAG GGTAGGAAAG GGCGCCTACT CTTCAGGGTC 5700 

TAAAGATCAA GTGGGCCTTG GATCGCTAAG CTGGCTCTGT TTGATGCTAT TTATGCAAGT 5760 

TAGGGTCTAT GTATTTAGGA TGCGCCTACT CTTCAGGGTC TAAAGATCAA GTGGGCCTTG 5820 

GATCGCTAAG CTGGCTCTGT TTGATGCTAT TTATGCAAGT TAGGGTCTAT GTATTTAGGA 5880 

TGTCTGCACC TTCTGCAGCC AGTCAGAAGC TGGAGAGGCA ACAGTGGATT GCTGCTTCTT 594 0 

GGGGAGAAGA GTATGCTTCC TTT TATC CAT GTAATTTAAC TGTAGAACCT GAGCTCTAAG 6000 

TAACCGAAGA ATGTATGCCT CTGTTCTTAT GTGCCACATC CTTGTTTAAA GGCTCTCTGT 6060 

ATGAAGAGAT GGGACCGTCA TCAGCACATT CCCTAGTGAG CCTACTGGCT CCTGGCAGCG 6120 

GCTTTTGTGG AAGACTCACT AGCCAGAAGA GAGGAGTGGG ACAGTCCTCT CCACCAAGAT 6180 

CTAAATCCAA ACAAAAGCAG GCTAGAGCCA GAAGAGAGGA CAAATCTTTG TTGTTCCTCT 6240 

TC TTTACAC A TACGCAAACC ACCTGTGACA GCTGGCAATT TTATAAATCA GGTAACTGGA '6300 

AGGAGGTTAA ACTCAGAAAA AAGAAGACCT CAGTCAATTC TCTACTTTTT TTTTTTTTTT 6360 

TCCAAATCAG ATAATAGCCC AGCAAATAGT GATAACAAAT AAAACCTTAG CTGTTCATGT 6420 

CTTGATTTCA ATAATTAATT CTTAATCATT AAGAGACCAT AATAAATACT CCTTTTCAAG 6480 

AGAAAAGCAA AACCATTAGA ATTGTTACTC AGCTCCTTCA AACTCAGGTT TGTAGCATAC 6540 

ATGAGTCCAT CCATCAGTCA AAGAATGGTT CCATCTGGAG TCTTAATGTA GAAAGAAAAA 6600 

TGGAGACTTG TAATAATGAG CTAGTTACAA AGTGCTTGTT CATTAAAATA GCACTGAAAA 6660 

TTGAAACATG AATTAACTGA TAATATTCCA ATCATTTGCC ATTTATGACA AAAATGGTTG 6720 

GCACTAACAA AGAACGAGCA CTTCCTTTCA GAGTTTCTGA GATAATGTAC GTGGAACAGT 6780 

CTGGGTGGAA TGGGGCTGAA ACCATGTGCA AGTCTGTGTC TTGTCAGTCC AAGAAGTGAC 6840 

ACCGAGATGT TAATTTTAGG GACCCGTGCC TTGTTTCCTA GCCCACAAGA ATGCAAACAT 6900 

CAAACAGATA CTCGCTAGCC TCATTTAAAT TGATTAAAGG AGGAGTGCAT CTTTGGCCGA 6960 

CAGTGGTGTA ACTGTGTGTG TGTGTGTGTG TGTGTGTGTG TGTGTGTGTG TGTGGGTGTG 7020 

GGTGTATGTG TGTTTTGTGC ATAACTATTT AAGGAAACTG GAATTTTAAA GTTACTTTTA 7080 

TACAAACCAA GAATATATGC TACAGATATA AGACAGACAT GGTTTGGTCC TATATTTCTA 7140 

GTCATGATGA ATGTATTTTG TATACCATCT TCATATAATA TACTTAAAAA TATTTCTTAA 7200 

TTGGGATTTG TAATCGTACC AACTTAATTG ATAAACTTGG CAACTGCTTT TATGTTCTGT 7260 

CTCCTTCCAT AAATTTTTCA AAATACTAAT TCAACAAAGA AAAAGCTCTT TTTTTT CCT A 7320 

AAATAAACTC AAATTTATCC TTGTTTAGAG CAGAGAAAAA TTAAGAAAAA CTTTGAAATG 73 80 

GTCTCAAAAA ATTGCTAAAT ATTTTCAATG GAAAACTAAA TGTTAGTTTA GCTGATTGTA 7440 

TGGGGTTTTC GAACCTTTCA CTTTTTGTTT GTTTTACCTA TTTCACAACT GTGTAAATTG 7500 

CCAATAATTC CTGTCCATGA AAATGCAAAT TATCCAGTGT AGATATATTT GACCATCACC 7560 

CTATGGATAT TGGCTAGTTT TGCCTTTATT AAGCAAATTC ATTTCAGCCT GAATGTCTGC 7620 
CTATATATTC TCTGCTCTTT GTATTCTCCT TTGAACCCGT TAAAACATCC TGTGGCACTC 



AC J 9 DNA sequence 

Gene name: Purine nucleoside phosphorylase 

Unigene number: Hs.75 514 

Probeset Accession #: K02574 

Nucleic acid Accession #: X00737 cliMter 

Coding sequence: 110-979 (predicted start/stop codons underlined) 

AACTGTGCGA ACCAGACCCG GCAGCCTTGC TCAGTTCAGC ATAGCGGAGC GGATCCGATC 60 

GGATCGGAGC ACACCGGAGC AGGCTCATCG AGAAGGCGTC TGCGAGACCA_TGGAGAACGG 12 0 

ATACACCTAT GAAGATTATA AGAACACTGC AGAATGGCTT CTGTCTCATA CTAAGCACCG 180 

AC CT CAAGTT GCAATAATCT GTGGTTCTGG ATTAGGAGGT CTGACTGATA AATTAACTCA 24 0 

GGCCCAGATC TTTGACTACA GTGAAATCCC CAACTTTCCT CGAAGTACAG TGCCAGGTCA 300 

TGCTGGCCGA CTGGTGTTTG GGTTCCTGAA TGGCAGGGCC TGTGTGATGA TGCAGGGCAG 3 60 
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GTTCCACATG TATGAAGGGT ACCCACTCTG GAAGGTGACA TTCCCAGTGA GGGTTTTCCA 420 

CCTTCTGGGT GTGGACACCC TGGTAGTCAC CAATGCAGCA GGAGGGCTGA ACCCCAAGTT 480 

TGAGGTTGGA GATATCATGC TGATCCGTGA CCATATCAAC CTACCTGGTT TCAGTGGTCA 540 

GAACCCTCTC AGAGGGCCCA ATGATGAAAG GTTTGGAGAT CGTTTCCCTG CCATGTCTGA 600 

TGCCTACGAC CGGACTATGA GGCAGAGGGC TCTCAGTACC TGGAAACAAA TGGGGGAGCA 660 

ACGTGAGCTA CAGGAAGGCA CCTATGTGAT GGTGGCAGGC CCCAGCTTTG AGACTGTGGC 720 

AGAATGTCGT GTGCTGCAGA AGCTGGGAGC AGACGCTGTT GGCATGAGTA CAGTACCAGA 780 

AGTTATCGTT GCACGGCACT GTGGACTTCG AGTCTTTGGC TTCTCACTCA TCACTAACAA 840 

GGTCATCATG GATTATGAAA GCCTGGAGAA GGCCAACCAT GAAGAAGTCT TAGCAGC TGG 900 

CAAACAAGCT GCACAGAAAT TGGAACAGTT TGTCTCCATT CTTATGGCCA GCATTCCACT 960 

CCCTGACAAA GCCAGTTGAC CTGCCTTGGA GTCGTCTGGC ATCTCCCACA CAAGACCCAA 1020 

GTAGCTGCTA CCTTCTTTGG CCCCTTGCTG GAGTCATGTG CCTCTGTCCT TAGGTTGTAG 1080 

CAGAAAGGAA AAGATTCCTG TCCTTCACCT TTCCCACTTT CTTCTACCAG ACCCTTCTGG 1140 

TGCCAGATCC TCTTCTCAAA GCTGGGATTA CAGGTGTGAG CATAGTGAGA CCTTGGCGCT 1200 

ACAAAATAAA GCTGTTCTCA TTCCTGTTCT TTCTTACACA AGAGCTGGAG CCCGTGCCCT 1260 

ACCACACATC TGTGGAGATG CCCAGGAfTT GACTCGGGCC TTAGAACTTT GCATAGCAGC 1320 

TGCTACTAGC TCTTTGAGAT AATACATTCC GAGGGGCTCA GTTCTGCCTT ATCTAAATCA 1380 
CCAGAGACCA AACAAGGACT AATCCAATAC CTCTTGGA 



ACK4 DNA sequence 

Gene name: EST 

Unigene number: Hs. 265499 

Probeset Accession #: R68763 

CAT cluster**: Cluster 46668_2 

Sequence: Both the EST corresponding to the probeset accession and exon 

prediction; number and the CAT cluster align with the Homo sapiens BAC clone 
AC009414 RP11-490M8. Using FGENESH, 2 exons predicted on this BAC clone upstream 
of the probeset. » 
Predicted exon 1: bases 5808-5837 of BAC clone AC009414 

AAAGTCTCGC CCAAACTTTG TTCGGCACAA CCAGCGCCGA GGGGGCGGCG CAGGCCAGGT 60 

GGGAGGGGGC CCGCAGCGGG CGGCCGTACC TTCGCAAACG CCCGCTTCGT ACTCGGTGAG 120 

GGAGTCGCCA TTGAGCGGGG GGCGGATGAC ACAACGCAGC CCCCGGTCGC AGGTTCCGTA 180 

AATCCCGAAG GTGCCGCCGC AGCTCTCGTT CCTCTGGCTG GCGCACGTGT AGCAGCAGCC 240 

GCAGACGCCC TGCACGATGC TCCCCGGGCA GTTCCTGGGC TCCTCGCACT TGGACTCGTC 300 

ACAGGGCAGG CAGACCAGCG CCCGGGTGCC GGAGCGCGCC AGCAGCAGCA GCAGCCCCAG 360 

CAGCGAGACC AGGAGGTGCC CGCAGCCGGC CAACCCCCTG TCCCCCGCCA CCAAGTACAT 420 

CCTCCTGCGC CGCCGCCGCC TCCTCCTCGC AGCCGGGCCG GGAGCGGGGC GGGCGCCCTC 480 

CCCTGCGCGG GGCACACGCG CCGCCGCCGC CGCACCAGCA GCCCGCGGTC CTCACCGCCC 540 

CTCTCGGGGC CCCCGGGGCG CGCCTCCCCT CGCGGGGCGA GGCCCCCGCC CCTTCTGCGG 600 

GCCGCGCCGA CCCCGAGCCC ACGAGCCTTG GCGCCGGCGG CAGCTTCCCC TCCTCCTCCT 660 

CCTCCTCCTC CCGGGAGGGA GGGGGAAAAA AGAAAAAAGT TTCCTCCCGG CAGCTCCGGT 720 

TCAACCCAAA CTTCTGGCGC GGCGGCGGCG GTGGCTGCTG CGCTCGGCTC CAGCCCGGGC 780 

CGGCGGCGCC TCCTCCCTCT CCTCCTCCGA GTCGGCCGGC CCCGCAGCGG CGCAGCCTCC 840 

GGGCCGGTCC CCGCCTCCCG AGCTGCCGAG TGGGCGCGGT GGCGCAGCAC AAGATCCGCG 900 

GCGTCCGCTC CGCGCGCCCC GCTCGCCTCA CTCCTGCGCC GCTCCTCCGG GCGCTTGTTT 960 

ATGGCTGGAG CCTCAGCCGC TCGGGCTGCG CCCTCCCCCA TCCTACCTCC TCCCCCAGAC 1020 

CTTCCCCCCA CCCCCACGCG CCGCGCGCCG CTCATTGGCT GCCCCCCCTC CCCGGCCCGG 1080 

CCGGCCCCCT CCGCCTCCCC CTCCCCCTCT CGGGCGGCCG GGCCCTTCCT CCCTCCCTCA 1140 

CACGCCTCCA CCTCTTCCCG ATCTCCTCCT CCCCGAGCCC GGCGCACCGA GCCGGCCGTG 1200 

CCACCGAGCT GCGGCTCTGG CCCCGGCGCC GCGGGTGCGC TGCGGATGGG CTTGGGGCGC 1260 

ACCCAGCGAG CAGCGAGAGT CGCGGTGTCC CGGGCGCTCG CTGGCACCGT GGCCGCAGCG 1320 

GCCGGCCTGG GAGCCAGGAG GGCGAGGCGG CTGCACCTTC GGGGCCAGAT TGGAGTT CGA 13 80 

AGAGTGGCGG GTACCCCAGA AGCTCGGGGC CGGGGCGATG GCTGCAGCCT CGGGAGGGTA 1440 
TCGCCGGATC GAACTC CGGG AAAGGGAAGC AAAGGCATGG AACCTCCGCA CACTGGATGA 

Predicted ACK4 gene seq (predicted start/stop codons underlined) 

ATGCCCCCGG AACAGCATCA TCAGCCCAAC AAAGTCTCGC CCAAACTTTG TT>f .GCACAA 60 

CCAGCGCCGA GGGGGCGGCG CAGGCCAGGT GGGAGGGGGC CCGCAGCGGG CGGCCGTACC 120 

TTCGCAAACG CCCGCTTCGT ACTCGGTGAG GGAGTCGCCA TTGAGCGGGG GGCGGATGAC 18 0 

ACAACGCAGC CCCCGGTCGC AGGTTCCGTA AATCCCGAAG GTGCCGCCGC AGCTCTCGTT 240 

CCTCTGGCTG GCGCACGTGT AGCAGCAGCC GCAGACGCCC TGCACGATGC TCCCCGGGCA 3 00 

GTTCCTGGGC TCCTCGCACT TGGACTCGTC ACAGGGCAGG CAGACCAGCG CCCGGGTGCC 36 0 

GGAGCGCGCC AGCAGCAGCA GCAGCCCCAG CAGCGAGACC AGGAGGTGCC CGCAGCCGGC 420 

CAACCCCCTG TCCCCCGCCA CCAAGTACAT CCTCCTGCGC CGCCGCCGCC TCCTCCTCGC 4 80 

AGCCGGGCCG GGAGCGGGGC GGGCGCCCTC CCCTGCGCGG GGCACACGCG CCGCCGCCGC 54 0 



CGCACCAGCA GCCCGCGGTC CTCACCGCCC CTCTCGGGGC CCCCGGGGCG CGCCTCCCCT 600 

CGCGGGGCGA GGCCCCCGCC CCTTCTGCGG GCCGCGCCGA CCCCGAGCCC ACGAGCCTTG 660 

GCGCCGGCGG CAGCTTCCCC TCCTCCTCCT CCTCCTCCTC CCGGGAGGGA GGGGGAAAAA 720 

AGAAAAAAGT TTCCTCCCGG CAGCTCCGGT TCAACCCAAA CTTCTGGCGC GGCGGCGGCG 780 

GTGGCTGCTG CGCTCGGCTC CAGCCCGGGC CGGCGGCGCC TCCTCCCTCT CCTCCTCCGA 840 

GTCGGCCGGC CCCGCAGCGG CGCAGCCTCC GGGCCGGTCC CCGCCTCCCG AGCTGCCGAG 900 

TGGGCGCGGT GGCGCAGCAC AAGATCCGCG GCGTCCGCTC CGCGCGCCCC GCTCGCCTCA 960 

CTCCTGCGCC GCTCCTCCGG GCGCTTGTTT ATGGCTGGAG CCTCAGCCGC TCGGGCTGCG 1020 

CCCTCCCCCA TCCTACCTCC TCCCCCAGAC CTTCCCCCCA CCCCCACGCG CCGCGCGCCG 1080 

CTCATTGGCT GCCCCCCCTC CCCGGCCCGG CCGGCCCCCT CCGCCTCCCC CTCCCCCTCT 1140 

CGGGCGGCCG GGCCCTTCCT CCCTCCCTCA CACGCCTCCA CCTCTTCCCG ATCTCCTCCT 1200 

CCCCGAGCCC GGCGCACCGA GCCGGCCGTG CCACCGAGCT GCGGCTCTGG CCCCGGCGCC 1260 

GCGGGTGCGC TGCGGATGGG CTTGGGGCGC ACCCAGCGAG CAGCGAGAGT CGCGGTGTCC 1320 

CGGGCGCTCG CTGGCACCGT GGCCGCAGCG GCCGGCCTGG GAGCCAGGAG GGCGAGGCGG 13 80 

CTGCACCTTC GGGGCCAGAT TGGAGTTCGA AGAGTGGCGG GTACCCCAGA AGCTCGGGGC 1440 

CGGGG CGATG GCTGCAGCCT CGGGAGGGTA TCGCCGGATC GAACT CCGGG AAAGGGAAGC 1500 
AAAGGCATGG AACCTCCGCA CACTGGATGA 



AAA8 DNA sequence 

Gene name: ETL protein, with extended open reading frame 
Unigene number: Hs. 57958 
Probeset Accession #: D58024 
Nucleotide Accession #: AF192403 

Coding sequence: 151-2136. Underlined sequences correspond to extended sequence 
not included in AF192403 . 

ATGAAAACAG CCGCACTCAC TCCGCCGCGC TCTCCGCCAC CGCCACCACT GCGGCCACCG 60 
CCAATGAAAC GCCTCCCGCT CCTAGTGGTT TTTTCCACTT TGTTGA ATTG TTCCTATACT ^ 120 
CAAAATTGCA CCAAGACACC TTGTCTCCCA AATGCAAAAT GTGAAA TACG CAATGGAATT ' 180 

GAAGCCTGCT ATTGCAACAT GGGATTTTCA GGAAATGGTG TCACAATTTG T GAAGATGAT 240 

AATGAATGTG GAAATTTAAC TCAGTCCTGT GGCGAAAATG CTAATT GCAC TAACACAGAA 300 

GGAAGTTATT ATTGTATGTG TGT AC CTGGC TTCAGATCCA G C AGTAACC A AGA CAGGTTT 360 

ATCACTAATG ATGGAACCGT CTGTATAGAA AATGTGAATG C AAACTGC C A TTTAGATAAT 420 

GTCTGTATAG CTGCAAATAT TAATAAAACT TTAACAAAAA TCAGAT CCAT AAAAGAACCT 480 

GTGGCTTTGC TACAAGAAGT CTATAGAAAT TCTGTG ACAG ATCTTTCACC AACAGATATA 540 

ATTACATATA TAGAAATATT AGCTGAATCA TCTTCATTAC TAGGTT ACAA GAACAACACT 600 

ATCTCAG CCA AGGACACCCT TTCTAACTCA ACTCTTACTG AATTTG TAAA AACCGTGAAT 660 

AATTTTGTTC AAAGGGATAC ATTTGTAGTT TGGGACAAGT TATCTGTG AA TCATAGGAGA 720 

ACACATCTTA CAAAACTCAT GCACACTGTT GAACAAGCTA CTTTAAGGAT ATC CCAGAGC 780 

TTCCAAAAGA CCACAGAGTT TGATACAAAT TCAACGGATA TAGCTCTC AA AGTTTTCTTT 840 

TTTGATTCAT ATAACATGAA ACATATTCAT CCTCATATGA ATATGGATGG AGA CTACATA 900 

AATATATTTC CAAAGAGAAA AGCTGCATAT GATT CAAA TG GCAATGTTGC AGTTGCATTT 960 

TTATATTATA AGAGTATTGG TCCTTTGCTT TCATCAT CTG ACAACT TCTT ATTGAAACCT 1020 

CAAAATTATG ATAATTCTGA AGAGGAGGAA AGAGTCATAT CTTCAGTAAT TTCAGTCTCA 1080 

ATGAGCTCAA ACCCACCCAC ATTATATGAA CTTGAAAAAA TAACAT TTAC ATTAAGTCAT 1140 

CGAAAGGTCA CAGATAGGTA TAGGAGTCTA TGTGCATTTT GGAATTACTC ACCTGAT AC C 1200 

ATGAATGGCA GCTGGTCTTC AGAGGGCTGT GAGCTGACAT ACTCAAATGA GACCCACACC 1260 

TCATGCCGCT GTAATCACCT GACACATTTT GCAATTTTGA TGTCCTCTGG TCCTTCCATT 1320 

GGTATTAAAG ATTATAATAT TCTTACAAGG AT C ACT C AAC TAGGAATA AT TATTTCACTG 1380 

ATTTGTCTTG CCATATGCAT TTTTACCTTC TGGTTCTTCA GTGAAATTCA A AGCACCAGG 1440 

ACAACAATTC ACAAAAATCT TTGCTGTAGC CTATTTCTTG CTG AACTTGT TTTTCTTGTT 1500 

GGGATCAATA CAAATACTAA TAAGCTCNTT T CTGTTTC AA TCATTGCC GG ACTGCTACAC 1560 

TACTTCTTTT TAGCTGCTTT TGCATGGATG TGCATTGAAG GC AT AC AT CT CTA TCT CATT 1620 

GTTGTGGGTG TCATCTACAA CAAGGGATTT TTGCACAAGA ATTTTTAT AT CTTTGGCTAT 1680 

CTAAGCCCAG CCG TGGTAGT TGGATTTTCG GCAGCACTAG GATACAGATA TTATGGCACA 1740 

ACAAAAGTAT GTTGGC TT AG CACCGAAACA CACTTTATTT GGAGTTTTAT AGGACCAGCA 1800 

TGCCTAATCA TTCTTGTTAA TCTCTTGGCT TTTGGAGTCA T C AT AT ACAA AGTTTTTCGT 1860 

. CACACTGCAG GGTTGAAACC AGAAGTTAGT TGCTTTGAGA ACATAAGGTC TTGTGCAAGA 1920 

GGAGCCCTCG CTCTTCTGTT CCTTCTCGGC ACCACCTGGA TCTTTGGGGT TCTCCATGTT 1980 

GTGCACGCAT CAGTGGTTAC AGCTTACCTC TTCACAGTCA GCAATGCTTT CCAGGGGATG 204 0 

TTCATTTTTT TATTCCTGTG TGTTTTATCT AGAAAGATTC AAGAAGAATA TTACAGATTG 2100 

TTCAAAAATG TCCCCTGTTG TTTTGGATGT TTAAGGTAAA CATAGAGAAT GGTGGATAAT 2160 

TACAACTGCA CTAAAAATAA AAATTCCAAG CTGTGGATGA CCAATGTATA AAAATGACTC 2220 

ATCAAATTAT CCAATTATTA ACTACTAGAC AAAAAGTATT TTAAATCAGT TTTTCTGTTT 2280 

ATGCTATAGG AACTGTAGAT AATAAGGTAA AATTATGTAT CATATAGATA TACTATGTTT 234 0 

TTCTATGTGA AATAGTTCTG TCAAAAATAG TATTGCAGAT ATTTGGAAAG TAATTGGTTT 2400 

CTCAGGAGTG AT AT CACTG C ACCCAAGGAA AGATTTTCTT TCTAACACGA GAAGTATATG 24 60 
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AATGTCCTGA AGGAAACCAC TGGCTTGATA TTTCTGTGAC TCGTGTTGCC TTTGAAACTA 2520 

GTCCCCTACC ACCTCGGTAA TGAGCTCCAT TACAGAAAGT GGAACATAAG AGAATGAAGG 2580 

GGCAGAATAT CAAACAGTGA AAAGGGAATG ATAAGATGTA TTTTGAATGA ACTGTTTTTT 2640 

CTGTAGACTA GCTGAGAAAT TGTTGACATA AAATAAAGAA TTGAAGAAAC ACATTTTACC 2700 

ATTTTGTGAA TTGTTCTGAA CTTAAATGTC CACTAAAACA ACTTAGACTT CTGTTTGCTA 2760 

AATCTGTTTC TTTTTCTAAT ATTCTAAAAA AAAAAAAAAG GTTTMCCYCC CAAATTGAAA 2820 

AAAAAAGGGA AAAAAAAATC TGTTTCTAAG GTTAGACTGA GATATATACT ATTTCCTTAC 2880 
TTATTTCACA GATTGTGACT TTGGATAGTT AATCAGTAAA ATATAAATGT GTCGA 



AAC6 DNA sequence 

Gene name: Homo sapiens cDNA FLJ13465 fis, clone PLACE1003493 , weakly similar to 

endothelial cell multimerin precursor 

Unigene number: Hs. 134797 

Probeset Accession #: AA025351 

Nucleotide Accession #: AK023527 

Coding sequence: predicted 75-2921 

Extended sequence: 729-3465 {underlined sequence) 

AAGACAACGT CACTAGCAGT TTCTGGAGCT ACTTGCCAAG GCTGAGTGTG AGCTGAGCCT 60 

GCCCCACCAC CAAGAtGATC CTGAGCTTGC TGTTCAGCCT TGGGGGCCCC CTGGGCTGGG 120 

GGCTGCTGGG GGCATGGGCC CAGGCTTCCA GTACTAGCCT CTCTGATCTG CAGAGCTCCA 180 

GGACACCTGG GGTCTGGAAG GCAGAGGCTG AGGACACCAG CAAGGACCCC GTTGGACGTA 240 

ACTGGTGCCC CTACCCAATG TCCAAGCTGG TCACCTTACT AGCTCTTTGC AAAACAGAGA 300 

AATTCCTCAT CCACTCGCAG CAGCCGTGTC CGCAGGGAGC TCCAGACTGC CAGAAAGTCA 360 

AAGTCATGTA CCGCATGGCC CACAAGCCAG TGTACCAGGT CAAGCAGAAG GTGCTGACCT 420 

CTTTGGCCTG GAGGTGCTGC CCTGGCTACA CGGGCCCCAA CTGCGAGCAC CACGATTCCA 4 80 

TGGCAATCCC TGAGCCTGCA GATCCTGGTG ACAGCCACCA GGAACCTCAG GATGGACCAG 540 

TCAGCTTCAA ACCTGGCCAC CTTGCTGCAG TGATCAATGA GGTTGAGGTG CAACAGGAAC 600 
AGCAGGAACA TCTGCTGGGA GATCTCCAGA ATGATGTGCA CCGGGTGGCA GACAGCCTGC ' 660 

CAGGCCTGTG GAAAGCCCTG CCTGGTAACC TCACAGCTGC AGTGATGGAA GCAAATCAAA 720 

PAGGGCAC GA GTTCCCTGAT AGATCCTTGG AGCAGG TGCT GCTACCCCAC GTGGACACCT 780 

TCCTACAAGT GCATTTCAGC CCCATCTGGA GGAGCTTTAA CCAAAGCCTG CACAGCCTTA 840 

CCCAGGCCAT AAGAAACCTG TCTCTTGACG TGGAGGCCAA CCGCCAGG CC ATCTCCAGAG 900 

TCCAGGACAG TGCCGTGGCC AGGGCTGACT TCCAGGAGCT TGGTGCCAAA TTTGAGGCCA 960 

AGGTCCAGGA GAACACTCAG AGAGTGGGTC AGCTGCGACA GGACG TGGAG GACCGCCTGC 1020 

ACGCCCAGCA CTTTACCCTG CACCGCTCGA TCTCAGAGCT CCAAGC CGAT GTGGACACCA 1080 

AATTGAAGAG GCTGCACAAG GCTCAGGAGG CCCCAGGGAC C AATGGCAGT CTGGTGTTGG 1140 

CAACGCCTGG GGCTGGGGCA AGGC CTGAGC CGGACAGCCT GCAGGC CAGG CTGGGCCAGC 1200 

TGCAGAGGAA CCTCT C AG AG CTGCACATGA CCACGGCCCG C AGGGAGGAG GAGT TGCAGT 1260 

ACAC CCTGGA GGACATGAGG GCCACCCTGA CCCGGCACGT GGA TGAGATC AAGGAACTGT 1320 

ACTCCGAATC GGACGAGACT TTCGATCAGA TTAGCAAGGT GGA GCGGCAG GTGGAGGAGC 1380 

TGCAGGTGAA CCACACGGCG CTCCGTGAGC TGCGCGTGAT C CTGATGGAG AAGTCTCTGA 1440 

TCATGGAGGA GAACAAGGAt-r GAGGTGGAGC GGCAGCTCCT GGA GCTCAAC CTCACGCTGC 1500 

AGCACCTGCA GGGTGGCCAT GCCGACCTCA TCAAGTACGT GAAGGACT GC AATTGCCAGA 1560 

AGCTCTATTT AGACCTGGAC GTCATCCGGG AGGGCCAGAG GGACGCCACG CGTGCCCTGG 1620 

AGGAGACCCA GGTGAGCCTG GACGAGCGGC GGCAGCTGGA CGGCTCCTCC CTGCAGGCCC 1680 

TGCAGAACGC CGTGGACGCC GTGT CGCTGG CCGTGGACGC GCACAAAGCG GAGGGCGAGC 1740 

GGGCGCGGGC GGCCACGTCG CGGCTCCGGA GCCAAGTGCA GGC GCTGGAT GACGAGGTGG 1800 

GCGCGCTGAA GGCGGCCGCG GCCGAGGCCC GCCACGAGGT G CGCCAGCTG CACAGCGCCT 1860 

TCGCCGCCCT GCTGGAGGAC GCGCTGCGGC ACGAGGCGGT GCT GGCCGCG CTCTTCGGGG 1920 

AGGAGGTGCT GGAGGAGATG TCTGAGCAGA CGCCGGGACC G CTGCCCCTG AGCTACGAGC 1980 

AGATCCGCGT GGCCCTGCAG GACGCCGCTA GCGGGCTGCA GGAGCAGG HG CTCGGCTGGG 2040 

ACGAGCTGGC CGCCCGAGTG ACGGCCCTGG AGCAGGCCTC G GAGCCCCCG CGGCCGGCAG 2100 

AGCACCTGGA GCCCAGCCAC GACGCGGGCC GCGAGGAGGC CGC CACCACC GCCCTGGCCG 2160 

GGCTGGCGCG GGAGCTCCAG AGCCTGAGCA ACGACGTCAA GAA TGT CGGG CGGTGCTGCG 2220 

AGGCYGAGGC CGGGGCCGGG GCCGCCTGCC TCAACGCCTC CCT TGACGGC CTCCACAACG 2280 

CACTCTTCGC CACTCAGCGC AGCTTGGAGC AG C AC C AG CG G CTCTTCCAC AGCCTCTTTG 2340 

GGAACTTCCA AGGGCTCATG GAAGCCAACG TCAGCCTGGA CCTGGGGAAG CTGCAGACCA 2400 

TGCTGAGCAG GAAAGGGAA «fl AAGCAGCAGA AAGACCTGGA A GCTCCCCGG AAGAGGGACA 2460 

AGAAGGAAGC GGAGCCTTTlT GTGGACATAC GGGTCACAGG GCCTGTGCCA GGTGCCTTGG 2520 

GCGCGGCGCT CTGGGAGGCA GRWTCCCCTG TGGCCTTCTA TGCCAGCTTT TCAGAAGGGA 2580 

CGGCTGCCCT GCAGACAGTG AAGTTCAACA CCACATACAT C AACATTGGC AGCAGCTACT 2640 

TCCCTGAACA TGGCTACTTC CGAGCCCCTG AGCGTGGTGT CTACCTGT TT GCAGTGAGCG 2700 

TTGAATTTGG CCCAGGGCCA GGCACCGGGC AGCTGGTGTT TGGAGGTC AC CATCGGACTC 2760 

CAGTCTGTAC CACTGGGCAG GGGAGTGGAA GCACAG CAAC GGTCTTTGCC ATGGCTGAGC 2820 

TGCAGAAGGG TGAGCGAGTA TGGTTTGAGT TAACCCAGGG ATCAATAACA AAGAGAAGCC 2880 

TGTCGGGCAC TGCATTTGGG GGCTTCCTGA TGTTTAAGAC CTGAACCCCA GCCCCAATCT 2940 
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GATCAGACAT CATGGACTCG CCCAGCTCTC CTCGGCCTGG GGCTCTGGCC 


AAGGATGGGC 


3000 


TGGAGGTCAT 


TCAGTTGGTC TGTCTCTTCC CTGGAAACCT 


TCTGCAAAGA 


TGGTGTGGTG 


3060 


TACGTGGCTT 


CCCTGTAACC ACATGGGGCT TGGCCATTTC 


TCCATGATGA 


GAAGGACTGG 


3120 


AATGCTTCTC 


CGGGCAGGAC ATGGT CCTAG GAAGCCTGAA 


CCTTGGCTTG 


GCATGCCTTC 


3180 


TCAGACAGCA 


CGGCCTGGGC TCCAACTCTT CACCACACCC 


TGTATTCTAC 


AACTTCTTTG 


3240 


GTGTTTTGCT 


CCTCCTGTGG TTGGAAACTT CTGTACAACA 


CTTTAAACTT 


TTCTCTTGCT 


3300 


TfCTCTTCTC TTCTCCCTTA TCGTATGATA GAAAGACATT 


CTTCCCCAGG 


AGGAATGTTT 


3360 


AAAATGGAGG 


CAACATTTTG GCCAACATTG GAAAGCACTA GAGGGCAATG GGATTAAACC 


3420 


AACCTGCTTG 


GTGTCTATTA GTCAGTAATG AAGACGACAG 


CCTGGCCAAC 


CAAGGGAAAG 


3480 


HAAATTAGTA TCTTTAGTTT CAGTCATTCC TTGTAGGATA TGGTTTAGCT GTGCCCCCAC 


3540 


CTAAAATATC 


ATCTTGAATT GTAATCCCTA TAATCCCCAC 


ATCAAGGGAG 


AGATCAGGTG 


3600 


GAGGTAATTG 


GATCTTGGGG GCGGTTCCCC CATGCTGTTC 


TTGTGATAGT 


TCTCACGAGA 


3660 


TCTGATGATT 


TTATAAGTTT GATAGTTCCT CCTGTGTTCA 


TTCTCCTTCC 


TGCCACCTTG 


3720 


TGAAGATGCC 


TTGGTTCCTC TTCACTGTCT GCCATGATTG 


TAAGTTTCCT 


GAGGCCTCCC 


3780 


CAG CCATGTG 


GAACAGTGAG TCAATTAAAC CTCTTTCCTT 


TATAAATT 







ACH7 DNA sequence 
Gene name: ESTs 
Unigene number: Hs.3807 
Probeset Accession #: AA292694 
BAC Accession #: AL161751 

FGENESH predicted exons: FGENESH predicts 2 exons on the minus strand of AL161751 
upstream of the ACH7 probeset. 

FGENESH predicted exon 1: 

ATGGGCAAAG ACTTCATGAC TAAAACACCA AAAGCATTTG CAACAAAAGC CAAAATTGAC 60 

AAATGGGATC TAATTAAACT AAAGAGCTTC TGCACAGCAA AAGAAACTAT CATCAGAGTG 120 

AACAGTCAAC CTACAGACTG GCAGAAAACT TTTGCAATCT ATCCATCTGA CAAAGGGGTA. 180 
ATAGCCAGAA TCTACAAGGA GCTTGAACAA ATTTATAAGA AAAAAAAACC AACAAAAA 

FGENESH predicted exon 2: 

CGCTCCGCAC ACATTTCCTG TCGCGGCCTA AGGGAAACTG TTGGCCGCTG GGCCCGCGGG 60 

GGGATTCTTG GCAGTTGGGG GGTCCGTCGG GAGCGAGGGC GGAGGGGAAG GGAGGGGGAA 12 0 

CCGGGTTGGG GAAGCCAGCT GTAGAGGGCG GTGACCGCGC TCCAGACACA GCTCTGCGTC 180 

CTCGAGCGGG ACAGATCCAA GTTGGGAGCA GCTCTGCGTG CGGGGCCTCA GAGAATGAGG 240 

CCGGCGTTCG CCCTGTGCCT CCTCTGGCAG GCGCTCTGGC CCGGGCCGGG CGGCGGCGAA 3 00 

CACCCCACTG CCGACCGTGC TGGCTGCTCG GCCTCGGGGG CCTGCTACAG CCTGCACCAC 360 

GCTACCATGA AGCGGCAGGC GGCCGAGGAG GCCTGCATCC TGCGAGGTGG GGCGCTCAGC 420 

ACCGTGCGTG CGGGCGCCGA GCTGCGCGCT GTGCTCGCGC TCCTGCGGGC AGGCCCAGGG 480 

CCCGGAGGGG GCTCCAAAGA CCTGCTGTTC TGGGTCGCAC TGGAGCGCAG GCGTTCCCAC 54 0 

TGCACCCTGG AGAACGAGCC TTTGCGGGGT TTCTCCTGGC TGTCCTCCGA CCCCGGCGGT 600 

CTCGAAAGCG ACACGCTGCA GTGGGTGGAG GAGCCCCAAC GCTCCTGCAC CGCGCGGAGA 660 

TGCGCGGTAC TCCAGGCCAC CGGTGGGGTC GAGCCCGCAG CTGGAAGGAG ATGCGATGCC 720 

ACCTGCGCGC CAACGGCTAC CTGTGCAAGT ACCAGTTTGA GGTCTTGTGT CCTGCGCCGC 780 

GCCCCGGGGC CGCCTCTAAC TTGAGCTATC GCGCGCCCTT CCAGCTGCAC AGCGCCGCTC 840 

TGGACTTCAG TCCACCTGGG ACCGAGGTGA GTGCGCTCTG CCGGGGACAG CTCCCGATCT 900 

CAGTTACTTG CATCGCGGAC GAAATCGGCG CTCGCTGGGA CAAACTCTCG GGCGATGTGT 960 

TGTGTCCCTG CCCCGGGAGG TACCTCCGTG CTGGCAAATG CGCAGAGCTC CCTAACTGCC 1020 

TAGACGACTT GGGAGGCTTT GCCTGCGAAT GTGCTACGGG CTTCGAGCTG GGGAAGGACG 1080 

GCCGCTCTTG TGTGACCAGT GGGGAAGGAC AGCCGACCCT TGGGGGGACC GGGGTGCCCA 1140 

CCAGGCGCCC GCCGGCCACT GCAACCAGCC CCGTGCCGCA GAGAACATGG CCAATCAGGG 1200 

TCGACGAGAA GCTGGGAGAG ACAC CACTTG TCCCTGAACA AGACAATTCA GTAACAT CT A 1260 

TTCCTGAGAT TCCTCGATGG GGATCACAGA GCACGATGTC TACCCTTCAA ATGTCCCTTC 13 20 

AAGCCGAGTC AAAGGCCACT ATCACCCCAT CAGGGAG CGT GATTT CCAAG TTTAATTCTA 13 80 

CGACTTCCTC TGCCACTCCT CAGGCTTTCG ACTCCTCCTC TGCCGTGGTC TTCATATTTG 144 0 

TGAGCACAGC AGTAGTAGTG TTGGTGATCT TGACCATGAC AGT AC TGGGG CTTGTCAAGC 1500 

TCTGCTTTCA CGAAAGCCCC TCTTCCCAGC CAAGGAAGGA GTCTATGGGC CCGCCGGGCC 1560 

TGGAGAGTGA TCCTGAGCCC GCTGCTTTGG GCTCCAGTTC TGCACATTGC ACAAACAATG 1620 

GGGTGAAAGT CGGGGACTGT GATCTGCGGG ACAGAGCAGA GGTGCCTTG CTGGCGGAGT 1680 
CCCCTCTTGG CTCTAGTGAT GCATAG 

ACH7 predicted coding seq (predicted start/stop codons underlined) 
ATG GGCAAAG ACTTCATGAC TAAAACACCA AAAGCATTTG CAACAAAAGC CAAAATTGAC 60 
AAATGGGATC TAATTAAACT AAAGAGCTTC TGCACAGCAA AAGAAACTAT CATCAGAGTG 12 0 
AACAGTCAAC CTACAGACTG GCAGAAAACT TTTGCAATCT ATCCATCTGA CAAAGGGGTA 180 
ATAGCCAGAA TCTACAAGGA GCTTGAACAA ATTTATAAGA AAAAAAAACC AACAAAAACG 240 
CTCCGCACAC ATTT CCTGTC GCGGCCTAAG GGAAACTGTT GGCCGCTGGG CCCGCGGGGG 3 00 
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15 



U20 



M=25 

y 5 

5 30 



y ^35 



40 



45 



50 



55 



GATTCTTGGC 

GGGTTGGGGA 

CGAGCGGGAC 

GGCGTTCGCC 

CCCCACTGCC 

TACCATGAAG 

CGTGCGTGCG 

CGGAGGGGGC 

CACCCTGGAG 

CGAAAGCGAC 

CGCGGTACTC 

CTGCGCGCCA 

CCCGGGGCCG 

GACTTCAGTC 

GTTACTTGCA 

TGTCCCTGCC 

GACGACTTGG 

CGCTCTTGTG 

AGGCGCCCGC 

GACGAGAAGC 

CCTGAGATTC 

GC CGAGTCAA 

ACTTCCTCTG 

AGCACAGCAG 

TGCTTTCACG 

GAGAGTGATC 

GTGAAAGTCG 

CCTCTTGGCT 



AGTTGGGGGG 

AGCCAGCTGT 

AGATCCAAGT 

CTGTGCCTCC 

GACCGTGCTG 

CGGCAGGCGG 

GGCGCCGAGC 

TCCAAAGACC 

AACGAGCCTT 

ACGCTGCAGT 

CAGGCCACCG 

ACGGCTACCT 

CCTCTAACTT 

CACCTGGGAC 

TCGCGGACGA 

CCGGGAGGTA 

GAGGCTTTGC 

TGACCAGTGG 

CGGCCACTGC 

TGGGAGAGAC 

CTCGATGGGG 

AGGCCACTAT 

CCACTCCTCA 

TAGTAGTGTT 

AAAGCCCCTC 

CTGAGCCCGC 

GGGACTGTGA 

CTAGTGATGC 



TCCGTCGGGA 

AGAGGGCGGT 

TGGGAGCAGC 

TCTGGCAGGC 

GCTGCTCGGC 

CCGAGGAGGC 

TGCGCGCTGT 

TGCTGTTCTG 

TGCGGGGTTT 

GGGTGGAGGA 

GTGGGGTCGA 

GTGCAAGTAC 

GAGCTATCGC 

CGAGGTGAGT 

AATCGGCGCT 

CCTCCGTGCT 

CTGCGAATGT 

GGAAGGACAG 

AACCAGCCCC 

ACCACTTGTC 

ATCACAGAGC 

CACCCCATCA 

GGCTTTCGAC 

GGTGATCTTG 

TTCCCAGCCA 

TGCTTTGGGC 

TCTGCGGGAC 

ATAG 



GCGAGGGCGG 

GACCGCGCTC 

TCTGCGTGCG 

GCTCTGGCCC 

CTCGGGGGCC 

CTGCATCCTG 

GCTCGCGCTC 

GGTCGCACTG 

CTCCTGGCTG 

GCCCCAACGC 

GCCCGCAGCT 

CAGTTTGAGG 

GCGCCCTTCC 

GCGCTCTGCC 

CGCTGGGACA 

GGCAAATGCG 

GCTACGGGCT 

CCGACCCTTG 

GTGCCGCAGA 

CCTGAACAAG 

ACGATGTCTA 

GGGAGCGTGA 

TCCTCCTCTG 

ACCATGACAG 

AGGAAGGAGT 

TCCAGTTCTG 

AGAGCAGAGG 



AGGGGAAGGG 
CAGACACAGC 
GGGCCTCAGA 
GGGCCGGGCG 
TGCTACAGCC 
CGAGGTGGGG 
CTGCGGGCAG 
GAGCGCAGGC 
TCCTCCGACC 
TCCTGCACCG 
GGAAGGAGAT 
TCTTGTGTCC 
AGCTGCACAG 
GGGGACAGCT 
AACTCTCGGG 
CAGAGCTCCC 
TCGAGCTGGG 
GGGGGACCGG 
GAACATGGCC 
ACAATTCAGT 
CCCTTCAAAT 
TTT CCAAGTT 
CCGTGGTCTT 
TACTGGGGCT 
CTATGGGCCC 
CACATTGCAC 
GTGCCTTGCT 



AGGGGGAACC 

TCTGCGTCCT 

GAATGAGGCC 

GCGGCGAACA 

TGCACCACGC 

CGCTCAGCAC 

GCCCAGGGCC 

GTTCCCACTG 

CCGGCGGTCT 

CGCGGAGATG 

GCGATGCCAC 

TGCGCCGCGC 

CGCCGCTCTG 

CCCGATCTCA 

CGATGTGTTG 

TAACTGCCTA 

GAAGGACGGC 

GGTGCCCACC 

AATCAGGGTC 

AACATCTATT 

GTCCCTTCAA 

TAATTCTACG 

CATATTTGTG 

TGTCAAGCTC 

GCCGGGCCTG 

AAACAATGGG 

GGCGGAGTCC 



AAD3 DNA sequence 

Gene name: ESTs 

Unigene number: Hs. 17404 

Probeset Accession #: N3 9584 

Nucleic Acid Accession #: N39584 

Coding sequence: no identified ORF; possible frameshifts 
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AAATGGGATT 

TTTTTTTTTT 

GGTGGCTCAC 

TCAGGAGTTT 

AATTTGCTGG 

AGAATGTCTT 

CAGCCTGTGC 

CAGTCTGAAT 

TCAAAGTCTA 

TAATCACCAA 

CAAATCTGGA 

TCCAGCTTCC 

CCACTGGCTG 

CTTTAGCTGC 

AGCCAGTGCT 

TTCTGAGCCC 

AGGAGCATGC 

TTTCCTGTAT 

GTTGATAAAT 

GATTACTCTT 

GATATGATTG 

CAGACATTAA 

GTCAATATTA 

AAAAAAAAAA 



GAGTTAAAAC 

TTTTATTATA 

GCCTGTAATC 

GAGACCAGCC 

GAGTGGTGGT 

GAACCTAGGA 

AACAAAAGTG 

GTATACCAGG 

AATCAGATAT 

AGACCCAGGG 

TACACACTTT 

TTACTCTCTT 

AACTGGGTCC 

TGTGAGAATT 

TTAAGAGCAA 

TGGACCCCTG 

ATAACAGTGT 

GAAATATGTT 

CCCTTTTTGT 

TATGC TATT A 

ACTGATGCGC 

GCTAAACTGT 

ATTT GTTGCA 

AAAAAAAAAA 



TATTTT ATT T 

CACACACTTC 

CCAGCACTTT 

TAGACAACAT 

GCATGCCTGT 

GGTGGAGGTT 

AAACTCCATT 

AGTGTGAGAG 

TTTTATTAAC 

TACCTAAAAG 

COPCTCTGTA 

TTCTGGGATT 

CCTAACTGAA 

TTGTCTTCCT 

CTTCCCGCAA 

CCCCCAAAAT 

GCTGAAAGAC 

TTATATAATC 

CCTTCTAAGA 

CTTTATATGC 

AGTCCAGAGC 

TTCGTTTTTT 

AATATTTAAT 

AAAAAAAA 



TAAATATACA 

AAGAGAATAT 

GGGAGGCCGA 

GGTGAAACCT 

AATCCCAGCT 

GCAGTGAGCT 

TCAAGAAAAA 

ACACATGCCC 

AATGACAACT 

GACTTTGCAA 

GATTCAAAAG 

TCTTTTTCTT 

ACAGCCCCTG 

CACCAGCCAG 

AT CAG AAACT 

ATTTTCATCT 

AGTTGTTGGT 

TCCTATTATT 

TGTTCTATTG 

CATTTGGGTA 

ATGTATGAAT 

TGAAAGAACA 

TTAAATAAAC 



TTTTAAAGCA 
GCACAGTCTA 
GGCATGTGGA 
TGTCTCTATG 
ACTTGGAAGG 
GAGATTGCAC 
AAAAAAAAAA 
ACTTCATGCA 
TGTTGCCAAC 
CCAAGCAAAG 
GTGCTTCCTT 
CTTTCTTTCT 
ACTTAGCCCA 
GTCCTCAAGG 
CACTGTGATT 
TTCCCCCAAA 
TTTTTGATTT 
TTTATCTTAT 
TAAAATCACT 
ATAAATAGTA 
AATCTCATAA 
ACTCATACTT 
ATTTTTGTAC 



GTTCTTTTTT 

GGCCGGGCAC 

TCACCTGAGG 

AAAAATACAA 

CTGAGGCAGG 

CATTGCACTC 

AGAATATGCA 

ACTCCTAAAC 

TCCCTGTTTC 

TCACTGTCTT 

CCCGGCTGTC 

GGCTCTTCCT 

AGCATGCTTC 

CAAAGTCCTC 

CCAAAAATGT 

CCTCCTTTAA 

TAGCATATTA 

GTTTTGTATT 

TATAAGGTAT 

AATGGTTGAT 

AACAGTATCA 

TGGAACAGTT 

CATGAAAAAA 



360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
*-_3 80 



AAD4 DNA sequence 
6 5 Gene name: ERG 

Unigene number: Hs. 279477 / Hs. 45514 
Probeset Accession #: R32894 
Nucleic Acid Accession #: M17254 
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Coding sequence: 257-1645 (predicted start/stop codons underlined) 

GTCCGCGCGT GTCCGCGCCC GCGTGTGCCA GCGCGCGTGC CTTGGCCGTG CGCGCCGAGC 60 

CGGGTCGCAC TAACTCCCTC GGCGCCGACG GCGGCGCTAA CCTCTCGGTT ATTCCAGGAT 120 

CTTTGGAGAC CCGAGGAAAG CCGTGTTGAC CAAAAGCAAG ACAAATGACT CACAGAGAAA 180 

AAAGATGGCA GAACCAAGGG CAACTAAAGC CGTCAGGTTC TGAACAGCTG GTAGATGGGC 240 

TGGCTTACTG AAGGACATGA TTCAGACTGT CCCGGACCCA GCAGCTCATA TCAAGGAAGC 300 

CTTATCAGTT GTGAGTGAGG ACCAGTCGTT GTTTGAGTGT GCCTACGGAA CGCCACACCT 360 

GGCTAAGACA GAGATGACCG CGTCCTCCTC CAGCGACTAT GGACAGACTT CCAAGATGAG 420 

CCCACGCGTC CCTCAGCAGG ATTGGCTGTC TCAACCCCCA GCCAGGGTCA CCATCAAAAT 480 

GGAATGTAAC CCTAGCCAGG TGAATGGCTC AAGGAACTCT CCTGATGAAT GCAGTGTGGC 54 0 

CAAAGGCGGG AAGATGGTGG GCAGCCCAGA CACCGTTGGG ATGAACTACG GCAGCTACAT 600 

GGAGGAGAAG CACATGCCAC CCCCAAACAT GACCACGAAC GAGCGCAGAG TTATCGTGCC 660 

AGCAGATCCT ACGCTATGGA GTACAGACCA TGTGCGGCAG TGGCTGGAGT GGGCGGTGAA 720 

AGAATATGGC CTTCCAGACG TCAACATCTT GT TATTCCAG AACATCGATG GGAAGGAACT 780 

GTGCAAGATG ACCAAGGACG ACTTCCAGAG GCTCACCCCC AGCTACAACG CCGACATCCT 840 

TCTCTCACAT CTCCACTACC T CAG AG AG AC TCCTCTTCCA CATTTGACTT CAGATGATGT 900 

TGATAAAGCC TTACAAAACT CTCCACGGTT AATGCATGCT AGAAACACAG ATTTACCATA 960 

TGAGCCCCCC AGGAGATCAG CCTGGACCGG TCACGGCCAC CCCACGCCCC AGTCGAAAGC 1020 

TGCTCAACCA TCTCCTTCCA CAGTGCCCAA AACTGAAGAC CAGCGTCCTC AGTTAGATCC 1080 

TTATCAGATT CTTGGACCAA CAAGTAGCCG CCTTGCAAAT CCAGGCAGTG GCCAGATCCA 1140 

GCTTTGGCAG TTCCTCCTGG AGCTCCTGTC GGACAGCTCC AACTCCAGCT GCATCACCTG 1200 

GGAAGGCACC AACGGGGAGT TCAAGATGAC GGATCC CGAC GAGGTGGCCC GGCGCTGGGG 1260 

AGAGCGGAAG AGCAAACCCA ACATGAACTA CGATAAGCTC AGCCGCGCCC TCCGTTACTA 132 0 

CTATGACAAG AACATCATGA CCAAGGTCCA TGGGAAGCGC TACGCCTACA AGTTCGACTT 1380 

C C ACGGGAT C GCCCAGGCCC TCCAGCCCCA CCCCCCGGAG TCATCTCTGT ACAAGTACCC 144 0 

CTCAGACCTC CCGTACATGG GCTCCTATCA CGCCCACCCA CAGAAGATGA ACTTTGTGGC 1500 

GCCCCACCCT CCAGCCCTCC CCGTGACATC TTCCAGTTTT TTTGCTGCCC CAAACCCATA 1560 

CTGGAATTCA CCAACTGGGG GTATATACCC CAACACTAGG CTCCCCACCA GCCATATGCC 1620 

TTCTCATCTG GGCACTTACT A CTAAA GACC TGG CGGAGGC TTTTCCCATC AGCGTGCATT '1680 

CACCAGCCCA TCGCCACAAA CTCTATCGGA GAACATGAAT CAAAAGTGCC TCAAGAGGAA 1740 

TGAAAAAAGC TTTACTGGGG CTGGGGAAGG AAGCCGGGGA AGAGATCCAA AGACTCTTGG 1800 

GAGGGAGTTA CTGAAGTCTT ACTACAGAAA TGAGGAGGAT GCTAAAAATG TCACGAATAT 1860 

GGACATATCA TCTGTGGACT GACCTTGTAA AAGACAGTGT ATGTAGAAGC ATGAAGTCTT 1920 

AAGGACAAAG TGCCAAAGAA AGTGGTCTTA AGAAATGTAT AAACTTTAGA GTAGAGTTTG 1980 

AATCCCACTA ATGCAAACTG GGATGAAACT AAAGCAATAG AAACAACACA GTTTTGACCT 2040 

AACATACCGT TTATAATGCC ATTTTAAGGA AAACTACCTG TATTTAAAAA TAGTTTCATA 2100 

TCAAAAACAA GAGAAAAGAC ACGAGAGAGA CTGTGGCCCA TCAACAGACG TTGATATGCA 2160 

ACTGCATGGC ATGTGCTGTT TTGGTTGAAA TCAAATACAT TCCGTTTGAT GGACAGCTGT 2220 

CAGCTTTCTC AAACTGTGAA GATGACCCAA AGTTTCCAAC TCCTTTACAG TATTACCGGG 2280 

ACTATGAACT AAAAGGTGGG ACTGAGGATG TGTATAGAGT GAGCGTGTGA TTGTAGACAG 2340 

AGGGGTGAAG AAGGAGGAGG AAGAGGCAGA GAAGGAGGAG ACCAGGCTGG GAAAGAAACT 2400 

TCTCAAGCAA TGAAGACTGG ACTCAGGACA TTTGGGGACT GTGTACAATG AGTTATGGAG 2460 

ACTCGAGGGT TCATGCAGTC AGTGTTATAC CAAACCCAGT GTTAGGAGAA AGGACACAGC 2520 

GTAATGGAGA AAGGGAAGTA GTAGAATTCA GAAACAAAAA TGCGCATCTC TTTCTTTGTT 2580 

TGTCAAATGA AAATTTTAAC TGGAATTGTC TGATATTTAA GAGAAACATT CAGGACCTCA 2640 

TCATTATGTG GGGGCTTTGT TCTCCACAGG GTCAGGTAAG AGATGGCCTT CTTGGCTGCC 2700 

ACAATCAGAA ATCACGCAGG CATTTTGGGT AGGCGGCCTC CAGTTTTCCT TTGAGTCGCG 2760 

AACGCTGTGC GTTTGTCAGA ATGAAGTATA CAAGTCAATG TTTTTCCCCC TTTTTATATA 2820 

ATAATTATAT AACTTATGCA TTTATACACT ACGAGTTGAT CTCGGCCAGC CAAAGACACA 28 80 

CGACAAAAGA GACAATCGAT ATAATGTGGC CTTGAATTTT AACTCTGTAT GCTTAATGTT 2940 

TACAATATGA AGTTATTAGT TCTTAGAATG CAGAATGTAT GTAATAAAAT AAGCTTGGCC 3 000 

T AGCATGG C A AAT CAGATTT ATACAGGAGT CTGCATTTGC ACTTTTTTTA GTGACTAAAG 3060 

TTGCTTAATG AAAACATGTG CTGAATGTTG TGGATTTTGT GTTATAATTT ACTTTGTCCA 3120 
GGAACT TGTG CAAGGGAGAG CCAAGGAAAT AGGATGTTTG GCACCC 



AAD5 DNA sequence 

Gene name: activin A receptor type II -like 1 (ALK-1) 
Unig en ? number: Hs.8881 / Hs . 172670 
Probeset Accession #: T57112 
Nucleic Acid Accession #: NM_000020 

Coding sequence: 283-1794 (predicted start/stop codons underlined) 

AGGAAACGGT TTATTAGGAG GGAGTGGTGG AGCTGGGCCA GGCAGGAAGA CGCTGGAATA 60 

AGAAACATTT TTGCTCCAGC CCCCATCCCA GTCCCGGGAG GCTGCCGCGC CAGCTGCGCC 120 

GAGCGAGCCC CTCCCCGGCT CCAGCCCGGT CCGGGGCCGC GCCGGACCCC AGCCCGCCGT 180 

CCAGCGCTGG CGGTGCAACT GCGGCCGCGC GGT GGAGGGG AGGTGGCCCC GGTCCGCCGA 24 0 
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AGGCTAGCGC CCCGCCACCC GCAGAGCGGG CCCAGAGGGA CCATGACCTT GGGCTCCCCC 300 

AGGAAAGGCC TTCTGATGCT GCTGATGGCC TTGGTGACCC AGGGAGACCC TGTGAAGCCG 360 

TCTCGGGGCC CGCTGGTGAC CTGCACGTGT GAGAGCCCAC ATTGCAAGGG GCCTACCTGC 420 

CGGGGGGCCT GGTGCACAGT AGTGCTGGTG CGGGAGGAGG GGAGGCACCC CCAGGAACAT 480 

CGGGGCTGCG GGAACTTGCA CAGGGAGCTC TGCAGGGGGC GCCCCACCGA GTTCGTCAAC 540 

CACTACTGCT GCGACAGCCA CCTCTGCAAC CACAACGTGT CCCTGGTGCT GGAGGCCACC 600 

CAACCTCCTT CGGAGCAGCC GGGAACAGAT GGCCAGCTGG CCCTGATCCT GGGCCCCGTG 660 

CTGGCCTTGC TGGCCCTGGT GGCCCTGGGT GTCCTGGGCC TGTGGCATGT CCGACGGAGG 720 

CAGGAGAAGC AGCGTGGCCT GCACAGCGAG CTGGGAGAGT CCAGTCTCAT CCTGAAAGCA 780 

TCTGAGCAGG GCGACACGAT GTTGGGGGAC CTCCTGGACA GTGACTGCAC CACAGGGAGT 840 

GGCTCAGGGC TCCCCTTCCT GGTGCAGAGG ACAGTGGCAC GGCAGGTTGC CTTGGTGGAG 900 

TGTGTGGGAA AAGGCCGCTA TGGCGAAGTG TGGCGGGGCT TGTGGCACGG TGAGAGTGTG 960 

GCCGTCAAGA TCTTCTCCTC GAGGGATGAA CAGTCCTGGT TCCGGGAGAC TGAGATCTAT 1020 

AACACAGTAT TGCTCAGACA CGACAACATC CTAGGCTTCA TCGCCTCAGA CATGACCTCC 1080 

CGCAACTCGA GCACGCAGCT GTGGCTCATC ACGCACTACC ACGAGCACGG CTCCCTCTAC 1140 

GACTTTCTGC AGAGACAGAC GCTGGAGCCC CATCTGGCTC TGAGGCTAGC TGTGTCCGCG 1200 

GCATGCGGCC TGGCGCACCT GCACGTGGAG ATCTTCGGTA CACAGGGCAA ACCAGCCATT 1260 

GCCCACCGCG ACTTCAAGAG CCGCAATGTG CTGGT CAAGA GCAACCTGCA GTGTTGCATC 1320 

GCCGACCTGG GCCTGGCTGT GATGCACTCA CAGGGCAGCG ATTACCTGGA CATCGGCAAC 13 80 

AACCCGAGAG TGGGCACCAA GCGGTACATG GCACCCGAGG TGCTGGACGA GCAGATCCGC 1440 

ACGGACTGCT TTGAGTCCTA CAAGTGGACT GACATCTGGG CCTTTGGCCT GGTGCTGTGG 1500 

GAGATTGCCC GCCGGACCAT CGTGAATGGC AT CGTGGAGG ACTATAGACC ACC CTTCTAT 1560 

GATGTGGTGC CCAATGACCC CAGCTTTGAG GACATGAAGA AGGTGGTGTG TGTGGATCAG 1620 

CAGACCCCCA CCATCCCTAA CCGGCTGGCT GCAGACCCGG TCCTCTCAGG CCTAGCTCAG 1680 

ATGATGCGGG AGTGCTGGTA CCCAAACCCC TCTGCCCGAC TCACCGCGCT GCGGATCAAG 1740 

AAGACACTAC AAAAAATTAG CAACAGTCCA GAGAAGCCTA AAGTGATTCA ATAGCCCAGG 1800 

AGCACCTGAT TCCTTTCTGC CTGCAGGGGG CTGGGGGGGT GGGGGGCAGT GGATGGTGCC 1860 

CTATCTGGGT AGAGGTAGTG TGAGTGTGGT GTGTGCTGGG GATGGGCAGC TGCGCCTGCC 1920 

TGCTCGGCCC CCAGCCCACC CAGCCAAAAA TACAGCTGGG CTGAAACCTG ATCCCCTGC^ 1980 

GTCTGGCCTG CTCAAAGCGG CAGGCTCCCT GACGCCTGGC TCTCTCCCCA CCCCTATGGC '2040 

CAGCATGGTG CACCCCCTAC CACTCCCGGG ACAGGATGCA AAAGAGGCTC CAGAGTCAGA 2100 

GTGCCAAGCC AGGGAATCCC AGTCCCAGAC TCAGAGCCCG GGCCTGCACT TTGCCCCCTG 2160 

CCCTTGATCA ACCCCACTGC CCCACCAGAG CTGCCAGGGT GGCACAGGGC CCTGTCCAGC 2220 

CCCTGGCACA CACTTCCCTG CCAGGCCTCA GCCTCTAGCA TAAGCTCCAG AGAGCCAGGG 2280 

CCCATCAGTT TCTCTCTGTG GATTTGTATC TCAGCTCCAT GATGCCTTGG GCTTTCTGTC 2340 

TCCTCAACAA GAGTGCAGCT TGCTGAATGT CAGCTGCCTG AGAGAGCTGG GGCCTGACTT 2400 

ACTAGGGCAT TAAATCCTAA GAGGTCCTAC TGAGGTGTGG CAGGATCACA GGCCAGTGGA 2460 

AAAAGGGCAG GTCAGATGGG CAAGGCCCAG GACTTTCAGA TTAACTGAGA GGATATCGAG 2520 

GCCAAGCATG GCAGGGGGAA GGTCAGTGGG TGTCAAGAGA CCCAGGTCTG ACCCCGGATG 2580 

TTTGCTCCAT GTGACAAAAG CAGGCCTGTC TCAGGACCTT TTCTTTTCTT TTTTCCTTCT 2640 

TTTTTTTTTT GACACGGAGT TTCGCTCTTG TTGTCCAGGC TAGAGTGCAA TGGCATGATC 2700 

CCAGCTCACC GCAACGTCTA CCTCCCAGGT TCAAATCATT CTCTTGCCTC AGACTCCCGA 2760 

GTAGCTGGGA TTACAGGCAC ATGCCACCAT GCCTGGCTAA TTTTGTATAT TTAGTAGAAA 2820 

CAGGGTTTCA CCATGCTGGC CATGCTGGTT CTCGAACTCC TGACCTCAGG TGTTCCACCT 2880 

ACCTCAGCCT CCCAAAGTGC TGGGGTTACA GGTGTGAGCC ATCGCGCCTG GCCAGGACCT 2940 

TTGTTTCTTA TCTACATATT GGAAGATTTG GTCCTGATGT CCTTTGAGGC TTCTTTAGCT 3000 

CTAGTTCTCT GACACTTCAG CCTATATCAC AGCTAACTTC YTCAGTCTCA TCTATTCCTT 3060 

ATGCTCCAGC CCCTGGCAAT TTGCCTCAAG ATGGGGGTTT GAAAATAACT TTACCTGACT 3120 

CAAGGAGTGT CTGGAGCACC TCCTAGTCTA AGT CTGCAAG CTCCAGTTCT TGCCTAAAAC 3180 

CATGCCAGTG GCCACCCTTG GGCTCAGACA GCTCTGGGCC TTTTGACCAC AAGCCAGCCC 3240 

CTCGCCCTCT CTGTGGCATA GTCTTCTCTG CCCCAGGACT GCAGGGCGGC TTCCTCCAAG 3300 

GCTTCCAAGG CTCAAAAGAA ATTTGGCTCC ATCCAAGAAG GCTCCAGCTC CCCTACTGGC 3360 

CCCTGGCTTC AGGCCCACAC CCCTGGGCCA GGS CCAGAGA GTGTGTCTCA GGAGAATTCA 3420 

ATGGGCTCTA GAGAGACACA CAGAAAGTTT GGGCATTTGG GAAATTTTCA AGGRTGTATG 34 80 

TATGGYTCAC GTATGGWGCA GGTTGTCCTG GTCCYKGGGT GCAGGGAAGT GGGCTGCAGG 354 0 

GAAGTGGATT GGAGGGGAGC TTGAGGAATA TAAGGAGCGG GGGTGGAGAC TCAGGCTATG 3600 

GACAAGGACA GCCCCAAGGT TGGGAAGACC TGGCCTTAGT CGTCCTCAGC CTAGGGCAGG 3 660 

GCAGTGAAGA AAGCTCTCCC CGCTCCTGCT GTAATGACCC AGAGTAGCCT CCCCAGGCCG 3720 

G CAT C TTATG TGTGTCTTCC ACCATCCTCA TGGTGGCACT TTTCTAGGCC TGTCTCCCAG 3780 

CAT TGTGCAA GGCTCGGAAG AGAACCAC * 7 V AGTGAAACTG GGTGAAAACA GAAAGCTCAA 384 0 

TGGATGGGCT AGGTTCCCAG ATCATTAGJG CAGAGTTTGC ACGTCCTCTG GTTCACTGGG 3900 

AATCCACCCA GCCCACGAAT CATCTCCCTC TTTGAAGGAT TTTWATTTCT ACTGGGTTTT 3960 

GGAACAAACT CCTGCTGAGA CCCCACAGCC AGAAACTGAA AGCAGCAGCT CCCCAAAGCC 402 0 

TGGAAAATCC CTAAGAGAAG GCCTGGGGGA MAGGAAKTGG AGTGACAGGG GACAGGTAGA 408 0 

GAGAAGGGGG CCCAATGGCC AGGGAGTGAA GGAGGTGGCG TTGCTGAGAG CAGTCTGCAC 4140 

ATGCTTCTGT CTGAGTGCAG GAAGGTGTTC CAGGGTCGAA ATTACACTTC TCGTACCTGG 4200 

AGACGCTGTT TGTGGGAGCA CTGGGCTCAT GCCTGGCACA CAATAGGTCT GCAATAAACC 4260 
ATGGTTAAAT CCTGAAAAAA AAAAAAAAA 
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AAD8 DNA sequence 
Gene name: ESTs 
5 Unigene number: Hs. 144953 

Probeset Accession #: AA404418 
Nucleic Acid Accession #: n/a 

Coding sequence: no ORF identified; possible frameshifts 

10 TATGTCCACC AAAGACACCT CGTTGGTCAT GTTCTATCAC CTCTTCGTCA AATTGACATC 60 

AGGTCCTAAC AGGTCACTTT CAAGATACAG AAGAGGCAAA TTTTGTTTTG AGACTTGGCC 120 

ATTCCTAGGG TCAGCAAAGT GTATTCCTGG CAGCCAGACC TTCAGTCACT TATCAGGAAA 180 

TGCTTGACCT AAAGACAGAC AATTCTTTCC CCAAACTTTG CTGTTTCTTT TTTGAGTCTT 24 0 

TGTTGAAAGA TTTCTTTTAA AAGGCGTTCG TGTGAGAAGA TCACAGCAAC AAATCTGGCT 300 

15 TGTTCTGTTT TAGACTTACT TTCTTAACTC TTGGGCAGAA GAAAATGAAT GAGATTTGAA 360 

GACCTTTGAT ACCTTGGGTA GACAAAGCTT GCCTTGAAAC TAGAAATAAG ACGAAACTAG 420 

ATTTTAAGGG GAAAAAATTT GCTAGTGGTA ATATAATTGG TTTTGTTTCA TTTTTTTATG 4 80 

AGTCTGAGGA GTTGACATTA AACGTTGGGA TGTTGCTTTG TTAATGAAGT CATTTCAATT 540 

TTTGCAACTC TTAACATCTG CATGCTTCCA TAAACAGTGG GTTGGAACAA AAGAAAATGT 600 

20 GACTAAGGGA TATTCCTTAA ATTCTTTTTT ATGTTATGAG AGAGAATATT GGAATATAAA 660 

GAATGTTACT TTATCTGGTA AACCATCTCA TAGGCCAGAA GCACTAACAG TTTGAATGGT 720 

TGGCTTAAAA AAAAACGGGA GTCTTTGAAT TTAAGCTTAT GTAAAATTAC TATGCAAATA 780 

TAGGTTATTA TTTATTTTTA CAGTGAAAAT AAAACACTAT TGAAGTATAA ATGGAAAGAA 84 0 

AATAAAAGCA AAGCCTGTTT AATATAGAGA CATTAATGTT GATATCACTG TACGAACAGT 900 

25 CATAGCTTGC TGCTCACTGC CGTTAAAGGG TTGACATACA AACATTGTGG AAGAGATTTC 960 

Hi ~ AGTTTGAGGG CTAGTGTCTG AATTATGGAC TCCTTACCCT ACTCCACCAC TTAAAACATT 1020 

yi TTAGAGACTT TTGTGAAATT AACAGGTCAT ATAATTAATA ATTGTTGTTT TATGTACATT 10 80 

pi TATTGAAAGG CCATATTGAG GCTCCATTGA TTTTTTTTCC TGCATATTTA TCAGTATCGA 1140 

O ATTAGAAAAT TGAACCTTCA GTGTTACTAG ATGGAAATCT ACCAAAAAGT AG CAAGGTTT 1200 

_ 30 ACGAATGGTG GGATTTATTG GTGATTAAAC ATTTTTTTCC TGTATTTTAT AAGTTTCACA *1260 

Is TTACATTTAC AATGAGAAAA AAATGTAAAT GTAGAATTAA AGTCTTGTTA ATATCGTAAT 1320 

TTGCCTATTG CTGTACTAAA AGAAGCTTCT ATAAAATGTA TCATTCTCAT CCTTAGATTC 13 80 

Of AGGCCAGAAA GTAACTTTCA GTGTTAGGTA TTTGAAATAA TGCAGCCTGT CAT ATGT AC T 144 0 

0 CTGGTTACCA GAATGAAAAA ACAAAAAGAG ATACATACAT AGTAAGGAAA CATGAAATTG 1500 

01 3 5 GAGGAATTGA TCCCCATGTG TATTGCAGCT TCATATACCA GTAGTCTCTA ATAAGTCATT 1560 
5* GCTTTAATAA AAAAAAAAAT AGAAAATTTA AA 

ACA2 DNA sequence 
40 Gene name: EST 

Unigene number: Hs. 16450 
Probeset Accession #: AA478778 
Nucleic Acid Accession #: AA478778 

Coding sequence: no ORF identified; possible frameshifts 

45 

TATTTTTGTA CGTAAAATGA TTC TATTATG ACTGCCTTTG CATGTAGTAA TATGACAAAG 6 0 

TGATCCTTCA TTATCACGGT ACACTATTGT TTACTTTTCA TCTGTAAATG TTTTATTGTT 120 

ACTTTTTTAA AATGAATTTT T T„T AAAAC AA TCTAGCCATC ATCAAGGTGC TATAAGAGTT 180 

GTATAAAAGA TATTTTTGGC ATTTCTAGGC AAGTATCAGC CAATAAGTAT GTTAGTGATA 240 

50 TCACAGATTG TACCAACTAT TAACTATGTT AAATAAGTAT TCAGTTTCAT GTGATCTCTG 3 00 

GGAAAAAAAT ATGCTGCCTT GGTGCTAATA TTGTATGTAT TTAAATGATC ATCTGACTCA 3 60 

GAAATATAAA CACTTTTAAT GAAAGGGAGG AACGGAAGGA CAATTTCCAG TGCACAGAAT 420 

CACTTGGATG AAATAAGACC AGCTCTTTAC CCTTATTTTT GGATATGCCT TTTTTGGAAG 480 

AGACTTAGAC TTTATCCTTA TTGTTGTTAG TGTTGTTAAT ATTCGTTGCT TCAGCCCACG 540 

5 5 GTGCCTTGGT CTCTCCACAA T C AAATGG AG GATCCCCCAA GCAGCTTCAT TACAGAGTGA 600 

TATTGGGAAA GTGAGATCCT CTCACCATTT TGCCAAGATA CTCTAAAATG ACATC CAAGT 660 

TTACCAGTAG AAAGACACAG GATGCACAGA ATGGGCATGA CCTTCAGCTC ACGAGCACAC 720 

CTGGAGAAAT TCAGAACCAG GTTCTGAATC ATCACGATTG CCTTTTGCAT GAAAACATCG 780 

GCTGGTGATG TGACTTCTCT TCAGGCCATG AGCCTAACAY CCTGCCGGTT TTCATGCCCG 84 0 

60 CTGCAGTAAT GGACGTTTGT GTGAAGAAAT GAACTGTGGA GTACAAAA **- CTTTGAGTCT 900 

TTCCGATTGC TCATTAATTC ACTTTTTTGT TACTTCTTTC CAAAATGGAA GTGCTGAAGC 960 

CATGGTCTTT CTGCCCCTCC AAGCTGATGA AGGGAAGCCT TTGCCAATGG CCCATGGAAG 1020 

ACACTTGGTT TGAGAAACCC TGCCCACTTC CAAAGACCAA AGAGATTAGG AAAAGCCTGG 1080 

CAGTATTCTC CAACTCCAAA CAAGCTCTAG AGTGCTCCAG GAAAAGTTAT ATTCAGTATA 114 0 

65 TGAATAAGTG TTATTCTCCA TTATTAATGT GTTCTGAAAA TATATTATGA ATAAATACAT 1200 
CACCACACCC AAAAAAAAAA AAAAAAAAAA AAAA 
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ACA4 DNA sequence 

Gene name: alpha satellite junction DNA sequence 
Unigene number: Hs. 247946 
Probeset Accession #: M21305 
Nucleic Acid Accession #.- M213 05 

Coding sequence: 1-165 (predicted start/stop codons underlined) 

ATGGAATGGA ATGGAATGGC ATGGAATCGT ATAAAGTGGA ATGGAATCAA CTCGAGTGGA 
ATGGAATGGA ATGGAATGGA ATGGAATGCA GTACAATGCA ATAGAATGGA ATGGAATGAA 
CTCGAGTTGA CTGGAATGGA ATGGAATGGA ATGCATTTGA ATTGA 



ACG6 DNA sequence 

Gene name: intercellular adhesion molecule 2 (ICAM2) 
Unigene number: Hs. 83733 
Probeset Accession #: M32334 
Nucleic Acid Accession #: NM_000873 

Coding sequence: 63-890 (predicted start/stop codons underlined) 

CTAAAGATCT CCCTCCAGGC AGCCCTTGGC TGGTCCCTGC GAGCCCGTGG AGACTGCCAG 
AG ATG TCCTC TTTCGGTTAC AGGACCCTGA CTGTGGCCCT CTTCACCCTG ATCTGCTGTC 
CAGGATCGGA TGAGAAGGTA TTCGAGGTAC ACGTGAGGCC AAAGAAGCTG GCGGTTGAGC 
CCAAAGGGTC CCTCGAGGTC AACTGCAGCA CCACCTGTAA CCAGCCTGAA GTGGGTGGTC 
TGGAGACCTC TCTAAATAAG ATT CTGCTGG ACGAACAGGC TCAGTGGAAA CATTACTTGG 
TCTCAAACAT CTCCCATGAC ACGGTCCTCC AATGCCACTT CACCTGCTCC GGGAAGCAGG 
AGTCAATGAA TTC CAACGTC AGCGTGTACC AGCCTCCAAG GCAGGTCATC CTGACACTGC 
AACCCACTTT GGTGGCTGTG GGCAAGTCCT TCACCATTGA GTGCAGGGTG CCCACCGTGG 
AGCCCCTGGA CAGCCTCACC CTCTTCCTGT TCCGTGGCAA TGAGACTCTG CACTATGAGA 
CCTTCGGGAA GGCAGCCCCT GCTCCGCAGG AGGCCACAGC CACATTCAAC AGCACGGCTp 
ACAGAGAGGA TGGCCACCGC AACTTCTCCT GCCTGGCTGT GCTGGACTTG ATGTCTCGCG 
GTGGCAACAT CTTTCACAAA CACTCAGCCC CGAAGATGTT GGAGATCTAT GAGCCTGTGT 
CGGACAGCCA GATGGTCATC ATAGTCACGG TGGTGTCGGT GTTGCTGTCC CTGTTCGTGA 
CATCTGTCCT GCTCTGCTTC ATCTTCGGCC AGCACT TGCG CCAGCAGCGG ATGGGCACCT 
ACGGGGTGCG AGCGGCTTGG AGGAGGCTGC CCCAGGCCTT CCGGCCATAG CAACCATGAG 
TGGCATGGCC ACCACCACGG TGGTCACTGG AACTCAGTGT GACTCCTCAG GGTTGAGGTC 
CAGCCCTGGC TGAAGGACTG TGACAGGCAG CAGAGACTTG GGACATTGCC TTTTCTAGCC 
CGAATACAAA CACCTGGACT T 



ACG7 DNA sequence 

Gene name: Cadherin 5, VE-cadherin (CDH5) 
Unigene number: Hs. 76206 
Probeset Accession #: X79981 
Nucleic Acid Accession #: NM_001795 

Coding sequence: 25-2379 (predicted start/stop codons underlined) 

GCACGATCTG TTCCTCCTGG GAAGATGCAG AGGCT C ATGA TGCTCCTCGC CACATCGGGC 
GCCTGCCTGG GCCTGCTGGC AG,TGGCAGCA GTGGCAGCAG CAGGTGCTAA CCCTGCCCAA 
CGGGACACCC ACAGCCTGCT GCCCACCCAC CGGCGCCAAA AGAGAGATTG GATTTGGAAC 
CAGATGCACA TTGATGAAGA GAAAAACACC TCACTTCCCC ATCATGTAGG CAAGATCAAG 
TCAAGCGTGA GTCGCAAGAA TGCCAAGTAC CTGCTCAAAG GAGAATATGT GGGCAAGGTC 
TTCCGGGTCG ATGCAGAGAC AGGAGACGTG TTCGCCATTG AGAGGCTGGA CCGGGAGAAT 
ATCTCAGAGT ACCACCTCAC TGCTGTCATT GTGGACAAGG ACACTGGTGA AAACCTGGAG 
ACTCCTTCCA GCTTCACCAT CAAAGTT CAT GACGTGAACG ACAACTGGCC TGTGTT CACG 
CATCGGTTGT TCAATGCGTC CGTGCCTGAG TCGTCGGCTG TGGGGACCTC AGTCATCTCT 
GTGACAGCAG TGGATGCAGA CGACCCCACT GTGGGAGACC ACGCCTCTGT CATGTACCAA 
ATCCTGAAGG GGAAAGAGTA TTTTGCCATC GATAATTCTG GACGTATTAT CACAATAACG 
AAAAGCTTGG ACCGAGAGAA GCAGGCCAGG TATGAGATCG TGGTGGAAGC GCGAGATGCC 
CAGGGCCTCC GGGGGGACTC GGGCACGGCC ACCGTGCTGG TCACTCTGCA AGACATCAAT 
GACAACTTCC CCTTCTTCAC CCAGACCAAG TACACATTTG TCGTGCCTGA AGACACCCGT 
GTGGGCACCT CTGTGGGCTC TCTGTTTGTT GAGGACCCAG ATGAGCCCCA GAACCGGATG 
ACCAAGTACA GCATCTTGCG GGGCGACTAC CAGGACGCTT TCACCATTGA GACAAACCCC 
GCCCACAACG AGGGC AT CAT CAAGCCCATG AAGCCTCTGG ATTATGAATA CAT C CAGCAA 
TACAGCTTCA TCGTCGAGGC CACAGACCCC ACCATCGACC TCCGATACAT GAGCCCTCCC 
GCGGGAAACA GAGCCCAGGT CATTATCAAC ATCACAGATG TGGACGAGCC CCCCATTTTC 
CAGCAGCCTT TCTACCACTT CCAG CTGAAG GAAAACCAGA AGAAGCCTCT GATTGGCACA 
GTGCTGGCCA TGGACCCTGA TGCGGCTAGG CATAGCATTG GATACTCCAT CCGCAGGACC 
AGTGACAAGG GCCAGTTCTT CCGAGTCACA AAAAAGGGGG ACATTTACAA TGAGAAAGAA 
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CTGGACAGAG AAGTCTACCC CTGGTATAAC CTGACTGTGG AGGCCAAAGA ACTGGATTCC 1380 

ACTGGAACCC CCACAGGAAA AGAATCCATT GTGCAAGTCC ACATTGAAGT TTTGGATGAG 1440 

AATGACAATG CCCCGGAGTT TGCCAAGCCC TACCAGCCCA AAGTGTGTGA GAACGCTGTC 1500 

CATGGCCAGC TGGTC CTGCA GATCTCCGCA ATAGACAAGG ACATAACACC ACGAAACGTG 1560 

AAGTTCAAAT TCACCTTGAA TACTGAGAAC AACTTTACCC TCACGGATAA TCACGATAAC 1620 

ACGGCCAACA TCACAGTCAA GTATGGGCAG TTTGACCGGG AGCATACCAA GGTCCACTTC 1680 

CTACCCGTGG TCATCTCAGA CAATGGGATG CCAAGTCGCA CGGGCACCAG CACGCTGACC 1740 

GTGGCCGTGT GCAAGTGCAA CGAGCAGGGC GAGTTCACCT TCTGCGAGGA TATGGCCGCC 1800 

CAGGTGGGCG TGAGCATCCA GGCAGTGGTA GCCATCTTAC TCTGCATCCT CACCATCACA 1860 

GTGATCACCC TGCTCATCTT CCTGCGGCGG CGGCTCCGGA AGCAGGCCCG CGCGCACGGC 1920 

AAGAGCGTGC CGGAGATC C A CGAGCAGCTG GTCACCTACG ACGAGGAGGG CGGCGGCGAG 1980 

ATGGACACCA C CAGCTACGA TGTGT CGGTG CTCAACTCGG TGCGCCGCGG CGGGGCCAAG 2040 

CCCCCGCGGC CCGCGCTGGA CGCCCGGCCT TCCCTCTATG CGCAGGTGCA GAAGCCACCG 2100 

AGGCACGCGC CTGGGGCACA CGGAGGGCCC GGGGAGATGG CAGCCATGAT CGAGGTGAAG 2160 

AAGGACGAGG CGGACCACGA CGGCGACGGC CCCCCCTACG ACACGCTGCA CATCTACGGC 222 0 

TACGAGGGCT CCGAGTCCAT AGCCGAGTCC CTCAGCTCCC TGGGCACCGA CTCATCCGAC 2280 

TCTGACGTGG ATTACGACTT CCTTAACGAC TGGGGACCCA GGTTTAAGAT GCTGGCTGAG 234 0 

CTGTACGGCT CGGACCCCCG GGAGGAGCTG CTGTAT TAG G CGGCCGAGGT CACTCTGGGC 2400 

CTGGGGACCC AAACCCCCTG CAGCCCAGGC CAGTCAGACT CCAGGCACCA CAGCCTCCAA 2460 

AAATGGCAGT GACTCCCCAG CCCAGCACCC CTTCCTCGTG GGTCCCAGAG ACCTCATCAG 252 0 

CCTTGGGATA GCAAACTCCA GGTTCCTGAA ATATCCAGGA AT AT ATGT C A GTGATGACTA 2580 

TTCTCAAATG CTGGCAAATC CAGGCTGGTG TTCTGTCTGG GCTCAGACAT CCACATAACC 2640 

CTGTCACCCA CAGACCGCCG TCTAACTCAA AGACTTCCTC TGGCTCCCCA AGGCTGCAAA 2700 

GCAAAACAGA CTGTGTTTAA CTGCTGCAGG GTCTTTTTCT AGGGTCCCTG AACGCCCTGG 2760 

TAAGGCTGGT GAGGTCCTGG TGCCTATCTG CCTGGAGGCA AAGGCCTGGA CAGCTTGACT 2820 

TGTGGGGCAG GATTCTCTGC AGCCCATTCC CAAGGGAGAC TGACCATCAT GCCCTCTCTC 2880 

GGGAGCCCTA GCCCTGCTCC AACT C CATAC TCCACTCCAA GTGCCCCACC ACTCCCCAAC 294 0 

CCCTCTCCAG GCCTGTCAAG AGGGAGGAAG GGGCCCCATG GCAGCTCCTG ACCTTGGGTC 3000 

CTGAAGTGAC CTCACTGGCC TGCCATGCCA GTAACTGTGC TGTACTGAGC ACTGAACCAC 3060 

ATTCAGGGAA ATGCTTATTA AAC CTTGAAG CAACTGTGAA TTCATTCTGG AGGGGCAGTG '312 0 

GAGATCAGGA GTGACAGATC ACAGGGTGAG GGCCACCTCC ACACCCACCC CCTCTGGAGA 3180 

AGGCCTGGAA GAGCTGAGAC CTTGCTTTGA GACTCCTCAG CACCCCTCCA GTTTTGCCTG 3240 

AGAAGGGGCA GATGTTCCCG GAGATCAGAA GACGTCTCCC CTTCTCTGCC TCACCTGGTC 3300 

GCCAATCCAT GCTCTCTTTC TTTTCTCTGT CTACTCCTTA TCCCTTGGTT TAGAGGAACC 3360 

CAAGATGTGG CCTTTAGCAA AACTGACAAT GTCCAAACCC ACTCATGACT GCATGACGGA 3420 

GCCGAGCATG TGTCTTTACA CCTCGCTGTT GTCACATCTC AGGGAACTGA CCCTCAGGCA 3480 

CACCTTGCAG AAGGAAGGCC CTGCCCTGCC CAACCTCTGT GGTCACCCAT GCATCATTCC 3540 

ACTGGAACGT TTCACTGCAA ACACACCTTG GAGAAGTGGC ATCAGTCAAC AGAGAGGGGC 3600 

AGGGAAGGAG ACACCAAGCT CACCCTTCGT CATGGACCGA GGTTCCCACT CTGGCAAAGC 3660 

CCCTCACACT GCAAGGGATT GTAGATAACA CTGACTTGTT TGTTTTAACC AATAACTAGC 3720 

TTCTTATAAT GATTTTTTTA CTAATGATAC TTACAAGTTT CTAGCTCTCA CAGACATATA 3780 

GAATAAGGGT TTTTGCATAA TAAGCAGGTT GTTATT TAGG TTAACAATAT TAATTCAGGT 384 0 

TTTTTAGTTG GAAAAACAAT TCCTGTAACC TTCTATTTTC TATAATTGTA GTAATTGCTC 3 900 

TACAGATAAT GTCTATATAT TGGCCAAACT GGTGCATGAC AAGTACTGTA TTTTTTTATA 3 960 
CCTAAATAAA GAAAAATCTT TAGCCTGGGC AACAAAAAAA 



ACG9 DNA sequence 

Gene name: lysyl oxidase- like 2 (L0XL2) 

Unigene number: Hs. 83354 

Probeset Accession #: U89942 

Nucleic Acid Accession #: NMJ302318 cluster 

Coding sequence: 248-2572 (predicted start/stop codons underlined) 

ACTCCAGCGC GCGGCTACCT ACGCTTGGTG CTTGCTTTCT CCAGCCATCG GAGACCAGAG 60 

CCGCCCCCTC TGCTCGAGAA AGGGGCTCAG CGGCGGCGGA AGCGGAGGGG GACCACCGTG 120 

GAGAGCGCGG TCCCAGCCCG GCCACTGCGG ATCCCTGAAA CCAAAAAGCT CCTGCTGCTT 180 

CTGTACCCCG CCTGTCCCTC CCAGCTGCGC AGGGCCCCTT CGTGGGATCA TCAGCCCGAA 240 

GACAGGGATG GAGAGGCCTC TGTGCTCCCA CCTCTGCAGC TGCCTGGCTA TGCTGGCCCT 3 00 

CCTGTCCCCC CTGAG^JTGG CACAGTATGA CAGCTGGCCC CATTACCCCG AGTACTTCCA 3 60 

GCAACCGGCT CCTGAG TATC ACCAGCCCCA GGCCCCCGCC AACGTGGCCA AGATTCAGCT 42 0 

GCGCCTGGCT GGGCAGAAGA GGAAGCACAG CGAGGGCCGG GTGGAGGTGT ACTATGATGG 480 

CCAGTGGGGC ACCGTGTGCG ATGACGACTT CTCCATCCAC GCTGCCCACG TCGTCTGCCG 54 0 

GGAGCTGGGC TATGTGGAGG CCAAGTCCTG GACTGCCAGC TCCTCCTACG GCAAGGGAGA 600 

AGGGCCCATC TGGTTAGACA ATCTCCACTG TACTGGCAAC GAGGCGACCC TTGCAGCATG 66 0 

CACCTCCAAT GGCTGGGGCG TCACTGACTG CAAGCACACG GAGGATGTCG GTGTGGTGTG 720 

CAGCGACAAA AGGATTCCTG GGTTCAAATT TGACAATTCG TTGATCAACC AGATAGAGAA 780 

CCTGAATATC CAGGTGGAGG ACATTC GGAT TCGAGCCATC CTCTCAACCT ACCGCAAGCG 84 0 
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CACCCCAGTG ATGGAGGGCT ACGTGGAGGT GAAGGAGGGC AAGACCTGGA AGCAGATCTG 900 

TGACAAGCAC TGGACGGCCA AGAATTCCCG CGTGGTCTGC GGCATGTTTG GCTTCCCTGG 960 

GGAGAGGACA TACAATACCA AAGTGTACAA AATGTTTGCC TCACGGAGGA AGCAGCGCTA 1020 

CTGGCCATTC TCCATGGACT GCACCGGCAC AGAGGCCCAC ATCTCCAGCT GCAAGCTGGG 1080 

CCCCCAGGTG TCACTGGACC CCATGAAGAA TGTCACCTGC GAGAATGGGC TGCCGGCCGT 1140 

GGTGAGTTGT GTGCCTGGGC AGGTCTTCAG CCCTGACGGA CCCTCGAGAT TCCGGAAAGC 1200 

ATACAAGCCA GAGCAACCCC TGGTGCGACT GAGAGGCGGT GCCTACATCG GGGAGGGCCG 1260 

CGTGGAGGTG CTCAAAAATG GAGAATGGGG GACCGTCTGC GACGACAAGT GGGACCTGGT 13 20 

GTCGGCCAGT GTGGTCTGCA GAGAGCTGGG CTTTGGGAGT GCCAAAGAGG CAGTCACTGG 13 80 

CTCCCGACTG GGGCAAGGGA TCGGACCCAT CCACCTCAAC GAGATCCAGT GCACAGGCAA 1440 

TGAGAAGTCC ATTATAGACT GCAAGTTCAA TGCCGAGTCT CAGGGCTGCA ACCACGAGGA 1500 

GGATGCTGGT GTGAGATGCA ACACCCCTGC CATGGGCTTG CAGAAGAAGC TGCGCCTGAA 1560 

CGGCGGCCGC AATCCCTACG AGGGCCGAGT GGAGGTGCTG GTGGAGAGAA ACGGGTCCCT 1620 

TGTGTGGGGG ATGGTGTGTG GCCAAAACTG GGGCATCGTG GAGGCCATGG TGGTCTGCCG 1680 

CCAGCTGGGC CTGGGATTCG CCAGCAACGC CTTCCAGGAG ACCTGGTATT GGCACGGAGA 1740 

TGTCAACAGC AACAAAGTGG TCATGAGTGG AGTGAAGTGC TCGGGAACGG AGCTGTCCCT 1800 

GGCGCACTGC CGCCACGACG GGGAGGACGT GGCCTGCCCC CAGGGCGGAG TGCAGTACGG 1860 

GGCCGGAGTT GCCTGCTCAG AAACCGCCCC TGACCTGGTC CTCAATGCGG AGATGGTGCA 1920 

GCAGACCACC TACCTGGAGG ACCGGCCCAT GTTCATGCTG CAGTGTGCCA TGGAGGAGAA 1980 

CTGCCTCTCG GCCTCAGCCG CGCAGACCGA CCCCACCACG GGCTACCGCC GGCTCCTGCG 2040 

CTTCTCCTCC CAGATCCACA ACAATGGCCA GTC CGACTTC CGGCCCAAGA ACGGCCGCCA 2100 

CGCGTGGATC TGGCACGACT GTCACAGGCA CTACCACAGC ATGGAGGTGT TCACCCACTA 2160 

TGACCTGCTG AACCTCAATG GCACCAAGGT GGCAGAGGGC CACAAGGCCA GCTTCTGCTT 2220 

GGAGGACACA GAATGTGAAG GAGACATCCA GAAGAATTAC GAGTGTGCCA ACTTCGGCGA 2280 

TCAGGGCATC ACCATGGGCT GCTGGGACAT GTACCGCCAT GACATCGACT GCCAGTGGGT- 2340 

TGACAT CACT GACGTGCCCC CTGGAGACTA CCTGTTCCAG GTTGTTATTA ACCCCAACTT 2400 

CGAGGTTGCA GAATCCGATT ACTCCAACAA CATCATGAAA TGCAGGAGCC GCTATGACGG 24 60 

CCACCGCATC TGGATGTACA ACTGCCACAT AGGTGGTTCC TTCAGCGAAG AGACGGAAAA 2520 
AAAGTT TG AG CACTTCAGCG GGCTCTTAAA CAACCAGCTG TCCCCGCAGTVAAAGAAGCCT -2580 

GCGTGGTCAA CTCCTGTCTT CAGGCCACAC CACATCTTCC ATGGGACTTC CCCCCAACAA '264 0 

CTGAGTCTGA ACGAATGCCA CGTGCCCTCA CCCAGCCCGG CCCCCACCCT GTCCAGACCC 2700 

CTACAGCTGT GTCTAAGCTC AGGAGGAAAG GGACCCTCCC ATCATTCATG GGGGGCTGCT 2760 

ACCTGACCCT TGGGGCCTGA GAAGGCCTTG GGGGGGTGGG GTTTGT CCAC AGAGCTGCTG 2820 

GAGCAGCACC AAGAGCCAGT CTTGACCGGG ATGAGGCCCA CAGACAGGTT GTCATCAGCT 2880 

TGTCCCATTC AAGC CACCGA GCTCACCACA GACACAGTGG AGCCGCGCTC TTCTCCAGTG 2940 

ACACGTGGAC AAATGCGGGC TCATCAGCCC CCCCAGAGAG GGTCAGGCCG AACCCCATTT 3000 

CTCCTCCTCT TAGGTCATTT TCAGCAAACT TGAATATCTA GACCTCTCTT CCAATGAAAC 3060 

CCTCCAGTCT ATTATAGTCA CATAGATAAT GGTGCCACGT GTTTTCTGAT TTGGTGAGCT 3120 

CAGACTTGGT GCTTCCCTCT CCACAACCCC CACCCCTTGT TTTTCAAGAT ACTATTATTA 3180 

TATTTTCACA GACTTTTGAA GCACAAATTT ATTGGCATTT AATATTGGAC ATCTGGGCCC 3240 

TTGGAAGTAC AAATCTAAGG AAAAACCAAC CCACTGTGTA AGTGACTCAT CTTCCTGTTG 3300 

TTCCAATTCT GTGGGTTTTT GATTCAACGG TGCTATAACC AGGGTCCTGG GTGACAGGGC 3360 
GCTCACTGAG CACCATGTGT CATCACAGAC ACTTACACAT ACTTGAAACT TGGAATAAAA 3420 
GAAAGATTTA TG 



ACH2 DNA sequence 

Gene name: TIE tyrosine-protein kinase 

Unigene number: Hs.78824 

Probeset Accession #: X60957 

Nucleic Acid Accession #: NM_005424 cluster 

Coding sequence: 37-3452 (predicted start/stop codons underlined) 

CGCTCGTCCT GGCTGGCCTG GGTCGGCCTC TGGAGT ATG G TCTGGCGGGT GCCCCCTTTC 60 

TTGCTCCCCA TCCTCTTCTT GGCTTCTCAT GTGGGCGCGG CGGTGGACCT GACGCTGCTG 120 

GCCAACCTGC GGCTCACGGA CCCCCAGCGC TTCTTCCTGA CTTGCGTGTC TGGGGAGGCC 18 0 

GGGGCGGGGA GGGGCTCGGA CGC CTGGGGC CCGCCCCTGC TGCTGGAGAA GGACGAC CGT 24 0 

ATCGTGCGCA CCCCGCCCGG GCCACCCCTG CGCCTGGCGC GCAACGGTTC GCACCAGGTC 300 

ACGCTTCGCG GCTTCTCCAA GCCCTCGGAC CTCGTGGGCG TCTTCTCCTG CGTGGGCGGT 360 

GCTGGGGCGC GGCGCACGCG CGTCATCTAC GTGCAC*! ACA GCCCTGGAGC CCACCTGCTT 420 

CCAGACAAGG TCACACACAC TGTGAACAAA GGTGAGACCG CTGTACTTTC TGCACGTGTG 480 

CACAAGGAGA AGCAGACAGA CGTGATCTGG AAGAGCAACG GATCCTACTT CTACACCCTG 54 0 

GACTGGCATG AAGCCCAGGA TGGGCGGTTC CTGCTGCAGC TCCCAAATGT GCAGCCACCA 600 

TCGAGCGGCA TCTACAGTGC CACTTACCTG GAAGCCAGCC CCCTGGGCAG CGCCTTCTTT 660 

CGGCTCATCG TGCGGGGTTG TGGGGCTGGG CGCTGGGGGC CAGGCTGTAC CAAGGAGTGC 72 0 

CCAGGTTGCC TACATGGAGG TGTCTGCCAC GACCATGACG GCGAATGTGT ATGCCCCCCT 780 

GGCTTCACTG GCACCCGCTG TGAACAGGCC TGCAGAGAGG GCCGTTTTGG GCAGAGCTGC 840 

CAGGAGCAGT GCCCAGGCAT ATCAGGCTGC CGGGGCCTCA CCTTCTGCCT CCCAGACCCC 900 
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TATGGCTGCT CTTGTGGATC TGGCTGGAGA GGAAGCCAGT GCCAAGAAGC TTGTGCCCCT 960 

GGTCATTTTG GGGCTGATTG CCGACTCCAG TGCCAGTGTC AGAATGGTGG CACTTGTGAC 1020 

CGGTTCAGTG GTTGTGTCTG CCCCTCTGGG TGGCATGGAG TGCACTGTGA GAAGTCAGAC 1080 

CGGATCCCCC AGATCCTCAA CATGGCCTCA GAACTGGAGT TCAACTTAGA GACGATGCCC 1140 

CGGATCAACT GTGCAGCTGC AGGGAACCCC TTCCCCGTGC GGGGCAGCAT AGAGCTACGC 1200 

AAGCCAGACG GCACTGTGCT CCTGTCCACC AAGGCCATTG TGGAGCCAGA G AAG AC C AC A 1260 

GCTGAGTTCG AGGTGCCCCG CTTGGTTCTT GCGGACAGTG GGTTCTGGGA GTGCCGTGTG 1320 

TCCACATCTG GCGGCCAAGA CAGCCGGCGC TTCAAGGTCA ATGTGAAAGT GCCCCCCGTG 13 80 

CCCCTGGCTG CACCTCGGCT CCTGACCAAG CAGAGCCGCC AGCTTGTGGT CTCCCCGCTG 1440 

GTCTCGTTCT CTGGGGATGG ACCCATCTCC ACTGTCCGCC TGCACTACCG GCCCCAGGAC 1500 

AGTACCATGG ACTGGTCGAC CATTGTGGTG GACCCCAGTG AGAACGTGAC GTTAATGAAC 1560 

CTGAGGC CAA AGACAGGATA CAGTGTTCGT GTGCAGCTGA GCCGGCCAGG GGAAGGAGGA 162 0 

GAGGGGGCCT GGGGGCCTCC CACCCTCATG ACCACAGACT GTCCTGAGCC TTTGTTGCAG 1680 

CCGTGGTTGG AGGGCTGGCA TGTGGAAGGC ACTGACCGGC TGCGAGTGAG CTGGTCCTTG 1740 

CCCTTGGTGC CCGGGCCACT GGTGGGCGAC GGTTTCCTGC TGCGCCTGTG GGACGGGACA 1800 

CGGGGGCAGG AGCGGCGGGA GAACGTCTCA TCCCCCCAGG CCCGCACTGC CCTCCTGACG 1860 

GGACTCACGC CTGGCACCCA CTACCAGCTG GATGTGCAGC TCTACCACTG CACCCTCCTG 1920 

GGCCCGGCCT CGCCCCCTGC ACACGTGCTT CTGCCCCCCA GTGGGCCTCC AGCCCCCCGA 1980 

CACCTCCACG CCCAGGCCCT CTCAGACTCC GAGATCCAGC TGACATGGAA GCACCCGGAG 204 0 

GCTCTGCCTG GGCCAATATC CAAGTACGTT GTGGAGGTGC AGGTGGCTGG GGGTGCAGGA 2100 

GACCCACTGT GGATAGACGT GGACAGGCCT GAGGAGACAA GCACCATCAT CCGTGGCCTC 2160 

AACGCCAGCA CGCGCTACCT CTTCCGCATG CGGGCCAGCA TTCAGGGGCT CGGGGACTGG 2220 

AGCAACACAG TAGAAGAGTC CACCCTGGGC AACGGGCTGC AGGCTGAGGG CCCAGTCCAA 2280 

GAGAGCCGGG CAGCTGAAGA GGGCCTGGAT CAGCAGCTGA TCCTGGCGGT GGTGGGCTCC 2340 

GTGTCTGCCA CCTGCCTCAC CATCCTGGCC GCCCTTTTAA CCCTGGTGTG CAT CCG CAGA 2400 

AGCTGCCTGC ATCGGAGACG CACCTTCACC TACCAGTCAG GCTCGGGCGA GGAGACCATC 2460 

CTGCAGTTCA GCTCAGGGAC CTTGACACTT ACCCGGCGGC CAAAACTGCA GCCCGAGCCC 2520 

CTGAGCTACC CAGTGCTAGA GTGGGAGGAC ATCACCTTTG AGGACCTCAT CGGGGAGGGG 2580 

AACTTCGGCC AGGTCATCCG GGCCATGATC AAGAAGGACG GGCTGAAGAT GAACGCAGCC 264 0 

AT CAAAATGC TGAAAGAGTA TGCCTCTGAA AATGAC CATC GTGACTTTGC GGGAGAACTG '2700 

GAAGTTCTGT GCAAATTGGG GCATCACCCC AACATCATCA ACCTCCTGGG GGCCTGTAAG 2760 

AACCGAGGTT ACTTGTATAT CGCTATTGAA TATGCCCCCT ACGGGAACCT GCTAGATTTT 2820 

CTGCGGAAAA GCCGGGTCCT AGAGACTGAC CCAGCTTTTG CTCGAGAGCA TGGGACAGCC 2880 

TCTACCCTTA GCTCCCGGCA GCTGCTGCGT TTCGCCAGTG ATGCGGCCAA TGGCATGCAG 2940 

TACCTGAGTG AGAAGCAGTT CATCCACAGG GACCTGGCTG CCCGGAATGT GCTGGTCGGA 3000 

GAGAACCTAG CCTCCAAGAT TGCAGACTTC GGCCTTTCTC GGGGAGAGGA GGTTTATGTG 3060 

AAGAAGACGA TGGGGCGTCT CCCTGTGCGC TGGATGGCCA TTGAGTCCCT GAACTACAGT 3120 

GTCTATACCA CCAAGAGTGA TGTCTGGTCC TTTGGAGTCC TTCTTTGGGA GATAGTGAGC 3180 

CTTGGAGGTA CACCCTACTG TGGCATGACC TGTGCCGAGC TCTATGAAAA GCTGCCCCAG 3240 

GGCTACCGCA TGGAGCAGCC TCGAAACTGT GACGATGAAG TGTACGAGCT GATGCGTCAG 3300 

TGCTGGCGGG ACCGTCCCTA TGAGCGACCC CCCTTTGCCC AGATTGCGCT ACAGCTAGGC 3360 

CGCATGCTGG AAGCCAGGAA GGCCTATGTG AACATGTCGC TGTTTGAGAA CTTCACTTAC 3420 

GCGGGCATTG ATGCCACAGC TGAGGAGGCC TGAGCTGCCA TCCAGCCAGA ACGTGGCTCT 3480 

GCTGGCCGGA GCAAACTCTG CTGTCTAACC TGTGACCAGT CTGACCCTTA CAGCCTCTGA 3540 

CTTAAGCTGC CTCAAGGAAT TTTTTTAACT TAAGGGAGAA AAAAAGGGAT CTGGGGATGG 3600 

GGTGGGCTTA GGGGAACTGG GTTCCCATGC TTTGTAGGTG TCTCATAGCT ATCCTGGGCA 3660 

TCCTTCTTTC TAGTTCAGCT GCCCCACAGG TGTGTTTCCC ATCCCACTGC TCCCCCAACA 3 720 

CAAACCCCCA CTCCAGCTCC TTCGC TTAAG CCAGCACTCA CACCACTAAC ATGCCCTGTT 378 0 

CAGCTACTCC CACTCCCGGC CTGTCATTCA GAAAAAAATA AATGTTCTAA TAAGCTCCAA 3840 
AAAAA 



ACH3 DNA sequence 

Gene name: placental growth factor (PGF; P1GF1 ; VEGF-related protein) 
Unigene number: Hs.2 894 
Probeset Accession #: X54936 

Nucleic Acid Accession #: NM_002632 cluster 

Coding sequence: 322-768 (predicted start/stop codons underlined) 

GGGATTCGGG CCGCCCAGCT ACGGGAGGAC CTGGAGTGGC ACTGGGCGCC CGACGG/ 5 CA 60 

TCCCCGGGAC CCGCCTGCCC CTCGGCGCCC CGCCCCGCCG GGCCGCTCCC CGTCGGCVTC 120 

CCCAGCCACA GCCTTACCTA CGGGCTCCTG ACT CCGCAAG GCTTCCAGAA GATGCTCGAA 180 

CCACCGGCCG GGGCCTCGGG GCAGCAGTGA GGGAGGCGTC CAGCCCCCCA CTCAGCTCTT 240 

CTCCTCCTGT GCCAGGGGCT CCCCGGGGGA TGAGCATGGT GGTTTTCCCT CGGAGCCCCC 3 00 

TGGCTCGGGA CGTCTGAGAA GATGCCGGTC ATGAGGCTGT TCCCTTGCTT CCTGCAGCTC 360 

CTGGCCGGGC TGGCGCTGCC TGCTGTGCCC CCCCAGCAGT GGGCCTTGTC TGCTGGGAAC 420 

GGCTCGTCAG AGGTGGAAGT GGTACCCTTC CAGGAAGTGT GGGGCCGCAG CTACTGCCGG 480 

GCGCTGGAGA GGCTGGTGGA CGTCGTGTCC GAGTACCCCA GCGAGGTGGA GCACATGTTC 540 
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AGCCCATCCT GTGTCTCCCT GCTGCGCTGC ACCGGCTGCT GCGGCGATGA GAATCTGCAC 600 

TGTGTGCCGG TGGAGACGGC CAATGTCACC ATGCAGCTCC TAAAGATCCG TTCTGGGGAC 660 

CGGCCCTCCT ACGTGGAGCT GACGTTCTCT CAGCACGTTC GCTGCGAATG CCGGCCTCTG 720 

CGGGAGAAGA TGAAGCCGGA AAGGTGCGGC GATGCTGTTC CCCGGAGGTA_ACCCACCCCT 780 

TGGAGGAGAG AGACCCCGCA CCCGGCTCGT GTATTTATTA CCGTCACACT CTTCAGTGAC 840 

TCCTGCTGGT ACCTGCCCTC TATTTATTAG CCAACTGTTT CCCTGCTGAA TGCCTCGCTC 900 

CCTTCAAGAC GAGGGGCAGG GAAGGACAGG ACCCTCAGGA ATTCAGTGCC TTCAACAACG 960 

TGAGAGAAAG AGAGAAGCCA GCCACAGACC CCTGGGAGCT TCCGCTTTGA AAGAAGCAAG 1020 

ACACGTGGCC TCGTGAGGGG CAAGCTAGGC CCCAGAGGCC CTGGAGGTCT CCAGGGGCCT 1080 

GCAGAAGGAA AGAAGGGGGC CCTGCTACCT GTTCTTGGGC CTCAGGCTCT GCACAGACAA 1140 

GCAGCCCTTG CTTTCGGAGC TCCTGTCCAA AGTAGGGATG CGGATTCTGC TGGGGCCGCC 1200 

ACGGCCTGGT GGTGGGAAGG CCGGCAGCGG GCGGAGGGGA TTCAGCCACT TCCCCCTCTT 1260 

CTTCTGAAGA TCAGAACATT CAGCTCTGGA GAACAGTGGT TGCCTGGGGG CTTTTGCCAC 1320 

TCCTTGTCCC CCGTGATCTC CCGTCACACT TTGCCATTTG CTTGTACTGG GACATTGTTC 13 80 

TTTCCGGCCG AGGTGCCACC ACCCTGCCCC CACTAAGAGA CACATACAGA GTGGGCCCCG 144 0 

GGCTGGAGAA AGAGCTGCCT GGATGAGAAA CAGCTCAGCC AGTGGGGATG AGGTCACCAG 1500 

GGGAGGAGCC TGTGCGTCCC AGCTGAAGGC AGTGGCAGGG GAGCAGGTTC CCCAAGGGCC 1560 

CTGGCACCCC CACAAG CTGT CCCTGCAGGG CCATCTGACT GCCAAGC CAG ATTCTCTTGA 1620 
ATAAAGTATT CTAGTGTGGA AACGC 



ACH4 DKTA sequence 

Gene name: nidogen 2 (NID2) 

Unigene number: Hs. 82733 

Probeset Accession #: D86425 

Nucleic Acid Accession #: NM_007361 cluster 

Coding sequence: 1-4131 (predicted start/stop codons underlined) 

ATGGAGGGGG ACCGGGTGGC CGGGCGGCCG GTGCTGTCGT CGTTACCAGT GCTACTGCTG 60 

CTGCAGTTGC TAATGTTGCG GGCCGCGGCG CTGCACCCAG ACGAGCTCTT CCCACACGGG ' 120 

GAGTCGTGGT GGGACCAGCT CCTGCAGGAA GGCGACGACG TAAAGCTCAG CCGTGGTGAA 180 

GCTGGCGAAT CCCCTGCACT TCTTACGAAG CCCGATTCAG CAACCTCTAC GTGGGCACCA 240 

ACGGCATCAT CTCCACTCAG GACTTCCCCA GGGAAACGCA GTATGTGGAC TATGATTTCC 300 

CCACCGACTT CCCGGCCATC GCCCCTTTTC TGGCGGACAT CGACACGAGC CACGGCAGAG 360 

GCCGAGTCCT GTACCGAGAG GACACCTCCC CCGCAGTGCT GGGCCTGGCC GCCCGCTATG 420 

TGCGCGCTGG CTTCCCGCGC TCTGCGCGCT TTTTACCCCC ACCCACGCCT TCCTGGCCAC 480 

CTGGGAGCAG GTAGGCGCTT ACGAGGAGGT CAAACGCGGG CGCTGCCCTC GGGAGAGCTG 540 

AACACTTTCC AGGCAGTTTT GGCATCTGAT GGGTCTGATA GCTACGCCCT CTTTCTTTAT 600 

CCTGCCAACG GCCTGCAGTT CCTTGGAACC CGCCCCAAAG AGTCTTACAA TGTCCAGCTT 660 

CAGCTTCCAG CTCGGGTGGG CTTCTGCCGA GGGGAGGCTG ATGATCTGAA GTCAGAAGGA 720 

CCATATTTCA GCTTGACTAG CACTGAACAG TCTGTGAAAA ATCTCTATCA ACTAAGCAAC 780 

CTGGGGATCC CTGGAGTGTG GGCTTTCCAT ATCGGCAGCA CTTCCCCGTT GGACAATGTC 840 

AGGCCAGCTG CAGTTGGAGA CCTTTCCGCT GCCCACTCTT CTGTTCCCCT GGGACGTTCC 900 

TTCAGCCATG CTACAGCCCT GGAAAGTGAC TATAATGAGG ACAATTTGGA TTACTACGAT 960 

GTGAATGAGG AGGAAGCTGA ATACCTTCCG GGTGAACCAG AGGAGGCATT GAATGGCCAC 1020 

AGCAGCATTG ATGTTTCCTT CCAATCCAAA GTGGATACAA AGCCTTTAGA GGAATCTTCC 1080 

ACCTTGGATC CTCACACCAA AGAAGGAACA TCTCTGGGAG AGGTAGGGGG CCCAGATTTA 1140 

AAAGGCCAAG TTGAGCCCTG GGATGAGAGA GAGACCAGAA GCCCAGCTCC ACCAGAGGTA 1200 

GACAGAGATT CACTGGCTCC TTCCTGGGAA ACCCCACCAC CGTACCCCGA AAACGGAAGC 1260 

ATCCAGCCCT ACCCAGATGG AGGGCCAGTG CCTTCGGAAA TGGATGTTCC CCCAGCTCAT 13 20 

CCTGAAGAAG AAATTGTTCT TCGAAGTTAC CCTGCTTCAG GTCACACTAC ACCCTTAAGT 1380 

CGAGGGACGT ATGAGGTGGG ACTGGAAGAC AACATAGGTT CCAACACCGA GGTCTTCACG 1440 

TATAATGCTG CCAACAAGGA AACCTGTGAA CACAACCACA GACAATGCTC CCGGCATGCC 1500 

TTCTGCACGG ACTATGC CAC TGGCTTCTGC TGCCACTGCC AAT C CAAGTT TTATGGAAAT 1560 

GGGAAGCACT GTCTGCCTGA GGGGGCACCT CACCGAGTGA ATGGGAAAGT GAGTGGCCAC 1620 

CTCCACGTGG GCCATACACC CGTGCACTTC ACTGATGTGG AC CTGCATGC GTATAT CGTG 1680 

GGCAATGATG GCAGAGCCTA CACGGCCATC AGCCACATCC CACAGCCAGC AGCCCAGGCC 1740 

CTCCTCCCCC T CAC AC C AAT TGGAGGCCTG TTTGGCTGGC TCTTTGCTTT AGAAAAACCT 1800 

GGCTCTGAGA ACGGCTTCAG CCTCGCAGGT GCTGCCTTTA CCCATGACAT GGAAGTTACA 1860 

T TACCCGG GAGAGGAGAC GGTTCGTATC ACTCAAACTG CTGAGGGACT TGACC CAGAG 1920 

AACTACCTGA GCATTAAGAC CAACATTCAA GGCCAGGTGC CTTACGTCCC AGCAAATTTC 1980 

ACAGCCCACA TCTCTCCCTA CAAGGAGCTG TACCACTACT CCGACTCCAC TGTGACCTCT 2040 

ACAAGTTCCA GAGAC TACT C TCTGACTTTT GGTGCAATCA ACCAAACATG GTCCTACCGC 2100 

ATCCACCAGA ACATCACTTA CCAGGTGTGC AGGCACGCCC CCAGACACCC GTCCTTCCCC 2160 

ACCACCCAGC AGCTGAACGT GG AC CGGGT C TTTGCCTTGT ATAATGATGA AGAAAGAGTG 2220 

CTTAGATTTG CTGTGACCAA TCAAATTGGC CCGGTCAAAG AAG ATT CAG A CCCCACTCCG 2280 

GTGAATCCTT GCTATGATGG GAGCCACATG TGTGACACAA CAGCACGGTG CCATCCAGGG 234 0 

ACAGGTGTAG ATTACACCTG TGAGTGCGCA TCTGGGTACC AGGGAGATGG ACGGAACTGT 2400 
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GTGGATGAAA ATGAATGTGC AACTGGCTTT CATCGCTGTG GCCCCAACTC TGTATGTATC 2460 

AACTTGCCTG GAAGCTACAG GTGTGAGTGC CGGAGTGGTT ATGAGTTTGC AGATGACCGG 2520 

CATACTTGCA TCTTGATCAC CCCACCTGCC AACCCCTGTG AGGATGGCAG TCATACCTGT 2580 

GCTCCTGCTG GGCAGGCCCG GTGTGTTCAC CATGGAGGCA GCACGTTCAG CTGTGCCTGC 2640 

CTGCCTGGTT ATGCCGGCGA TGGGCACCAG TGCACTGATG TAGATGAATG CTCAGAAAAC 2700 

AGATGTCACC CTGCAGCTAC CTGCTACAAT ACTCCTGGTT CCTTCTCCTG CCGTTGTCAA 2760 

CCCGGATATT ATGGGGATGG ATTTCAGTGC ATACCTGACT CCACCTCAAG CCTGACACCC 2820 

TGTGAACAAC AGCAGCGCCA TGCCCAGGCC CAGTATGCCT ACCCTGGGGC CCGGTTCCAC 2880 

ATCCCCCAAT GCGACGAGCA GGGCAACTTC CTGCCCCTAC AGTGTCATGG CAGCACTGGT 2940 

TTCTGCTGGT GCGTGGACCC TGATGGTCAT GAAGTTCCTG GTACCCAGAC TCCACCTGGC 3000 

TCCACCCCGC CTCACTGTGG ACCATCACCA GAGCCCACCC AGAGGCCCCC GACCATCTGT 3060 

GAGCGCTGGA GGGAAAACCT GCTGGAGCAC TACGGTGGCA CCCCCCGAGA TGACCAGTAC 3120 

GTGCCCCAGT GCGATGACCT GGGCCACTTC ATCCCCCTGC AGTGCCACGG AAAGAGCGAC 3180 

TTCTGCTGGT GTGTGGACAA AGATGGCAGA GAGGTGCAGG GCACCCGCTC CCAGCCAGGC 324 0 

ACCACCCCTG CGTGTATACC CACCGTCGCT CCACCCATGG TCCGGCCCAC GCCCCGGCCA 3300 

GATGTGACCC CTCCATCTGT GGGCACCTTC CTGCTCTATA CTCAGGGCCA GCAGATTGGC 33 60 

TACTTACCCC TCAATGGCAC CAGGCTT GAG AAGGATGCAG CTAAGACCCT GCTGTCTCTG 3420 

CATGGCTCCA TAATCGTGGG AATTGAT TAC GACTGCCGGG AGAGGATGGT GTACTGGACA 3480 

GATGTTGCTG GACGGACAAT CAGCCGTGCC GGTCTGGAAC TGGGAGCAGA GCCTGAGACG 3540 

ATCGTGAATT CAGGTCTGAT AAGCCCTGAA GGACTTGCCA TAGACCACAT CCGCAGAACA 3600 

ATGTACTGGA CGGACAGTGT CCTGGATAAG ATAGAGAGCG CCCTGCTGGA TGGCTCTGAG 3660 

CGCAAGGTCC TCTTCTACAC AGATCTGGTG AATCCCCGTG CCATCGCTGT GGATCCAATC 3720 

CGAGGCAACT TGTACTGGAC AGACTGGAAT AGAGAAGCTC CTAAAATTGA AACGTCATCT 3780 

TTAGATGGAG AAAACAGAAG AATTCTGATC AATACAGACA TTGGATTGCC CAATGGCTTA 3 840 

ACCTTTGACC CTTTCTCTAA ACTGCTCTGC TGGGCAGATG CAGGAACCAA AAAACTGGAG 3 900 

TGTACACTAC CTGATGGAAC TGGACGGCGT GT C ATT C AAA ACAACCT CAA GTACCCCTTC 3 960 

AGCATCGTAA GCTATGCAGA TCACTTCTAC CACACAGACT GGAGGAGGGA TGGTGTTGTA 4020 

TCAGTAAATA AACATAGTGG CCAGTTTACT GATGAGTATC TCCCAGAACA ACGATCTCAC 4080 
CTCTACGGGA TAACTGCAGT CTACCCCTAC TGCC CAACAG GAAGAAAGTA^AGTACAGTAA . 414 0 

TGTAAAGGAA GACTTGGAGT TTACAATCAG AACCTGGACC CTAAAGAACA GTGACTGCAA «4200 

AGGCAAAGAA AGTAAAAAAG GAATTGGCCA TTAGACGTTC CTGAGCATCC AAGATGAACA 4260 

TTTTGTAGTG CAAAAAGACT TTTGTGAAAA GCTGATACCT CAATCTTTAC TAC TGTATTT 4320 

TTAAAAATGA AGGTTGTTAT TGCAAGTTTA AAAAGGTAAC AGAATTTTAA CTGTTGCTTA 43 80 

TTAAAGCAAC TTCTTGTAAA CATTTATCAT TAATATTTAA AAGATCAAAT TCATTCAACT 4440 

AAGAATTAGA GTTTAAGACT CTAAACCTGA TTTTTGCCAT GGATTCCTTC TGGCCAAGAA 4500 

ATTAAAGCAC ATGTGATCAA TATAACAATA TAAT CCTAAA CCTTGACAGT TGGAGAAGCC 4560 

AATGCAGAAC TGATGGGAAA GGACCAATTA TTTATAGTTT CCCAACAAAA GTTCTAAGAT 4620 

TTTTTACCTC TGCATCAGTG CATTTCTATT TATATCAAAA GGTGCTAAAA TGATTCAATT 4680 

TGCATTTTCT GATCCTGTAG TGCCTCTATA GAAGTAC CC A CAGAAAGTAA AGTATCACAT 4740 

TTATAAATAC CAAAGATGTA AGAATTTTAA AATTTTCTAG ATTACTCCAA TAAAGTGTTT 4800 
TAAGTTTAAA AAAAAAAAAA AAAAAAAAA 



ACH5 DNA sequence 

Gene name: SNL {singed- like; sea urchin fascin homology like) 
Unigene number: Hs. 11 84 00 
Probeset Accession #: U03 057 
Nucleic Acid Accession #: NMJ303088 

Coding sequence: 112-1593 (predicted start/stop codons underlined) 

GCGGAGGGTG CGTGCGGGCC GCGGCAGCCG AACAAAGGAG CAGGGGCGCC GCCGCAGGGA 60 

CCCGCCACCC ACCTCCCGGG GCCGCGCAGC GGCCTCTCGT CTACTGCCAC C ATGA CCGCC 120 

AACGGCACAG CCGAGGCGGT GCAGATCCAG TTCGGCCTCA TCAACTGCGG CAACAAGTAC 180 

CTGACGGCCG AGGCGTTCGG GTTCAAGGTG AACGCGTCCG CCAGCAGCCT GAAGAAGAAG 240 

C AG AT CTGG A CGCTGGAGCA GCCCCCTGAC GAGGCGGGCA GCGCGGCCGT GTGCCTGCGC 300 

AGCCACCTGG GCCGCTACCT GGCGGCGGAC AAGGACGGCA ACGTGACCTG CGAGCGCGAG 360 

GTGCCCGGTC CCGACTGCCG TTTCCTCATC GTGGCGCACG ACGACGGTCG CTGGTCGCTG 42 0 

CAGTCCGAGG CGCACCGGCG CTACTTCGGC GGCACCGAGG ACCGCCTGTC CTGCTTCGCG 4 80 

CAGACGGTGT CCCCCGCCGA GAAGTGGAGC GTGCACATCG CCATGCACCC TCAGGTCAAC 540 

ATCTACAGTG TCACCCGTAA GCvrOTACGCG CACCTGAGCG CGCGGCCGGC CGACGAGATC 600 

GCCGTGGACC GCGACGTGCC CTGGGGCGTC GACTCGCTCA TCACCCTCGC CTTCCAGGAC 660 

CAGCGCTACA GCGTGCAGAC CGCCGACCAC CGCTTCCTGC GCCACGACGG GCGCCTGGTG 72 0 

GCGCGCCCCG AGCCGGCCAC TGGCTACACG -CTGGAGTTCC GCTCCGGCAA GGTGGCCTTC 780 

CGCGACTGCG AGGGCCGTTA CCTGGCGCCG TCGGGGCCCA GCGGCACGCT CAAGGCGGGC 84 0 

AAGGCCACCA AGGTGGGCAA GGACGAGCTC TTTGCTCTGG AGCAGAGCTG CGCCCAGGTC 900 

GTGCTGCAGG CGGCCAACGA GAGGAACGTG TCCACGCGCC AGGGTATGGA CCTGTCTGCC 960 

AATCAGGACG AGGAGACCGA CCAGGAGACC TTCCAGCTGG AGAT CGACCG CGACACCAAA 1020 

AAGTGTGCCT TCCGTACCCA CACGGGCAAG TACTGGACGC TGACGGC CAC CGGGGGCGTG 1080 
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CAGTCCACCG CCTCCAGCAA GAATGCCAGC TGCTACTTTG ACATCGAGTG GCGTGACCGG 1140 

CGCATCACAC TGAGGGCGTC CAATGGCAAG TTTGTGACCT CCAAGAAGAA TGGGCAGCTG 1200 

GCCGCCTCGG TGGAGACAGC AGGGGACTCA GAGCTCTTCC TCATGAAGCT CATCAACCGC 1260 

CCCATCATCG TGTTCCGCGG GGAGCATGGC TTCATCGGCT GCCGCAAGGT CACGGGCACC 1320 

CTGGACGCCA ACCGCTCCAG CTATGACGTC TTCCAGCTGG AGTTCAACGA TGGCGCCTAC 1380 

AACATCAAAG ACTCCACAGG CAAATACTGG ACGGTGGGCA GTGACTCCGC GGTCACCAGC 1440 

AGCGGCGACA CTCCTGTGGA CTTCTTCTTC GAGTTCTGCG ACTATAACAA GGTGGCCATC 1500 

AAGGTGGGCG GGCGCTACCT GAAGGGCGAC CACGCAGGCG TCCTGAAGGC CTCGGCGGAA 1560 

ACCGTGGACC CCGCCTCGCT CTGGGAGTAC TAG GGCCGGC CCGTCCTTCC CCGCCCCTGC 1620 

CCACATGGCG GCTCCTGCCA ACCCTCCCTG CTAACCCCTT CTCCGCCAGG TGGGCTCCAG 168 0 

GGCGGGAGGC AAGCCCCCTT GCCTTTCAAA CTGGAAACCC CAGAGAAAAC GGTGCCCCCA 1740 

CCTGTCGCCC CTATGGACTC CCCACTCTCC CCTCCGCCCG GGTTCCCTAC TCCCCTCGGG 1800 

TCAGCGGCTG CGGCCTGGCC CTGGGAGGGA TTTCAGATGC CCCTGCCCTC TTGTCTGCCA 1860 

CGGGGCGAGT CTGGCACCTC TTTCTTCTGA CCTCAGACGG CTCTGAGCCT TATTTCTCTG 1920 

GAAGCGGCTA AGGGACGGTT GGGGGCTGGG AGCCCTGGGC GTGTAGTGTA ACTGGAATCT 1980 

TTTGCCTCTC CCAGCCACCT CCTCCCAGCC CCCCAGGAGA GCTGGGCACA TGTCCCAAGC 2040 

CTGTCAGTGG CCCTCCCTGG TGCACTGTCC CCGAAACCCC TGCTTGGGAA GGGAAGCTGT 2100 

CGGGAGGGCT AGGACTGACC CTTGTGGTGT TTTTTTGGGT GGTGGCTGGA AACAGCCCCT 2160 

CTCCCACGTG GGAGAGGCTC AGCCTGGCTC CCTTCCCTGG AGCGGCAGGG CGTGACGGCC 2220 

ACAGGGTCTG CCCGCTGCAC GTTCTGCCAA GGTGGTGGTG GCGGGCGGGT AGGGGTGTGG 2280 

GGGCCGTCTT CCTCCTGTCT CTTTCCTTTC ACCCTAGCCT GACTGGAAGC AGAAAATGAC 2340 

CAAATCAGTA TTTTTTTTAA TGAAATATTA TTGCTGGAGG CGTCCCAGGC AAGCCTGGCT 2400 

GTAGTAGCGA GTGATCTGGC GGGGGGCGTC TCAGCACCCT CCCCAGGGGG TGCATCTCAG 2460 

CCCCCTCTTT CCGTCCTTCC CGTCCAGCCC CAGCCCTGGG CCTGGGCTGC CGACACCTGG 2520 

GCCAGAGCCC CTGCTGTGAT TGGTGCTCCC TGGGCCTCCC GGGTGGATGA AGCCAGGCGT 2580 

CGCCCCCTCC GGGAGCCCTG GGGTGAGCCG CCGGGGCCCC CCTGCTGCCA GCCTCCCCCG 2640 

TCCCCAACAT GCATCTCACT CTGGGTGTCT VTGGTCTTTTA TTTTTTGTAA GTGTCATTTG 2700 

TATAACTCTA AACGCCCATG ATAGTAGCTT CAAACTGGAA ATAGCGAAAT AAAATAACTC 2760 
AGTCTGC , * 



ACH6 DNA sequence 

Gene name: endothelial protein C receptor (EPCR; PROCR) 
Unigene number: Hs. 82353 
Probeset Accession #: L35545 
Nucleic Acid Accession #: NM_006404 

Coding sequence: 25-741 (predicted start/ stop codons underlined) 

CAGGTCCGGA GCCTCAACTT CAGGATGTTG ACAACATTGC TGCCGATACT GCTGCTGTCT 60 

GGCTGGGCCT TTTGTAGCCA AGACGCCTCA GATGGCCTCC AAAGACTTCA TATGCTCCAG 120 

ATCTCCTACT TCCGCGACCC CTATCACGTG TGGTACCAGG GCAACGCGTC GCTGGGGGGA 180 

CACCTAACGC ACGTGCTGGA AGGCCCAGAC ACCAACACCA CGATCATTCA GCTGCAGCCC 24 0 

TTGCAGGAGC CCGAGAGCTG GGCGCGCACG CAGAGTGGCC TGCAGTCCTA CCTGCTCCAG 300 

TTCCACGGCC TCGTGCGCCT GGTGCACCAG GAGCGGACCT TGGCCTTTCC TCTGACCATC 360 

CGCTGCTTCC TGGGCTGTGA GCTGCCTCCC GAGGGCTCTA GAGCCCATGT CTTCTTCGAA 420 

GTGGCTGTGA ATGGGAGCTC CTTTGTGAGT TTCCGGCCGG AGAGAGCCTT GTGGCAGGCA 4 80 

GACACCCAGG TCACCTCCGG AGTGGTCACC TTCACCCTGC AGCAGCTCAA TGC CTACAAC 540 

CGCACTCGGT ATGAACTGCG GG.AATTCCTG GAGGACACCT GTGTGCAGTA TGTGCAGAAA. 600 

CATATTTCCG CGGAAAACAC GAAAGGGAGC CAAACAAGCC GCTCCTACAC TTCGCTGGTC 660 

CTGGGCGTCC TGGTGGGCGG TTTCATCATT GCTGGTGTGG CTGTAGGCAT CTTCCTGTGC 720 

ACAGGTGGAC GGCGATGT TA AT TACTCTCC AGCCCCGTCA GAAGGGGCTG GATTGATGGA 780 

GGCTGGCAAG GGAAAGTTTC AGCTCACTGT GAAGCCAGAC TCCCCAACTG AAACACCAGA 840 

AGGTTTGGAG TGACAGCTCC TTTCTTCTCC CACATCTGCC CACTGAAGAT TTGAGGGAGG 900 

GGAGATGGAG AGGAGAGGTG GACAAAGTAC TTGGTTTGCT AAGAACCTAA GAACGTGTAT 960 

GCTTTGCTGA ATTAGTCTGA TAAGTGAATG TTTATCTATC TTTGTGGAAA ACAGATAATG 1020 

GAGTTGGGGC AGGAAGCCTA TGCGCCATCC TCCAAAGACA GACAGAATCA CCTGAGGCGT 1080 

T C AAAAG AT A TAACCAAATA AACAAGTCAT CCACAATCAA AATACAACAT TCAATACTTC 114 0 

CAGGTGTGTC AGACTTGGGA TGGGACGCTG ATATAATAGG GTAGAAA.GAA GTAACACGAA 1200 

GAAGTGGTGG AAATGTAAAA TCCAAGTCAT ATGGCAGTGA TCAATTATTA ATCAATTAAT 1260 
AATATTAATA AATTTCTTAT ATTT * J 



ACH8 DNA sequence 

Gene name: melanoma adhesion molecule (MCAM; MUC18) 
Unigene number: Hs. 2 11 57 9 
Probeset Accession #: D51069 
Nucleic Acid Accession #: NM_006500 

Coding sequence: 27-1967 (predicted start and stop codons underlined) 
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ACTTGCGTCT CGCCCTCCGG CCAAGCATGG GGCTTCCCAG GCTGGTCTGC GCCTTCTTGC 60 

TCGCCGCCTG CTGCTGCTGT CCTCGCGTCG CGGGTGTGCC CGGAGAGGCT GAGCAGCCTG 120 

CGCCTGAGCT GGTGGAGGTG GAAGTGGGCA GCACAGCCCT TCTGAAGTGC GGCCTCTCCC 180 

AGTCCCAAGG CAACCTCAGC CATGTCGACT GGTTTTCTGT CCACAAGGAG AAGCGGACGC 240 

TCATCTTCCG TGTGCGCCAG GGCCAGGGCC AGAGCGAACC TGGGGAGTAC GAGCAGCGGC 300 

TCAGCCTCCA GGACAGAGGG GCTACTCTGG CCCTGACTCA AGTCACCCCC CAAGACGAGC 360 

GCATCTTCTT GTGCCAGGGC AAGCGCCCTC GGTCCCAGGA GTACCGCATC CAGCTCCGCG 420 

TCTACAAAGC TCCGGAGGAG CCAAACATCC AGGTCAACCC CCTGGGCATC CCTGTGAACA 480 

GTAAGGAGCC TGAGGAGGTC GCTACCTGTG TAGGGAGGAA CGGGTACCCC ATTCCTCAAG 540 

TCATCTGGTA CAAGAATGGC CGGCCTCTGA AGGAGGAGAA GAACCGGGTC CACATTCAGT 600 

CGTCCCAGAC TGTGGAGTCG AGTGGTTTGT ACACCTTGCA GAGTATTCTG AAGGCACAGC 660 

TGGTTAAAGA AGACAAAGAT GCCCAGTTTT ACTGTGAGCT CAACTACCGG CTGCCCAGTG 720 

GGAACCACAT GAAGGAGTCC AGGGAAGTCA CCGTCCCTGT TTTCTACCCG ACAGAAAAAG 780 

TGTGGCTGGA AGTGGAGCCC GTGGGAATGC TGAAGGAAGG GGACCGCGTG GAAATCAGGT 840 

GTTTGGCTGA TGGCAACCCT CCACCACACT TCAGCATCAG CAAGCAGAAC CCCAGCACCA 900 

GGGAGGCAGA GGAAGAGACA ACCAACGACA ACGGGGTCCT GGTGCTGGAG CCTGCCCGGA 960 

AGGAACACAG TGGGCGCTAT GAATGTCAGG CCTGGAACTT GGACACCATG ATATCGCTGC 1020 

TGAGTGAACC ACAGGAACTA CTGGTGAACT ATGTGTCTGA CGTCCGAGTG AGTCCCGCAG 1080 

CCCCTGAGAG ACAGGAAGGC AGGAGCCTCA CCCTGACCTG TGAGGCAGAG AGTAGCCAGG 1140 

ACCTCGAGTT CCAGTGGCTG AGAGAAGAGA CAGACCAGGT GCTGGAAAGG GGGCCTGTGC 1200 

TTCAGTTGCA TGACCTGAAA CGGGAGGCAG GAGGCGGCTA TCGCTGCGTG GCGTCTGTGC 1260 

CCAGCATACC CGGCCTGAAC CGCACACAGC TGGTCAAGCT GGCCATTTTT GGCCCCCCTT 1320 

GGATGGCATT CAAGGAGAGG AAGGTGTGGG TGAAAGAGAA TATGGTGTTG AATCTGTCTT 13 80 

GTGAAGCGTC AGGGCACCCC CGGCCCACCA TCTCCTGGAA CGTCAACGGC ACGGCAAGTG 144 0 

AACAAGACCA AGATCCACAG CGAGTCCTGA GCACCCTGAA TGTCCTCGTG ACCCCGGAGC 1500 

TGTTGGAGAC AGGTGTTGAA TGCACGGCCT CCAACGACCT GGGCAAAAAC ACCAGCATCC 1560 

TCTTCCTGGA GCTGGTCAAT TTAACCACCC TCACACCAGA CTCCAACACA ACCACTGGCC 1620 
TCAGCACTTC CACTGCCAGT CCTCATACCA GAGCCAACAG CACCTCCACA GAGAGAAAGp - 16 80 

TGCCGGAGCC GGAGAGCCGG GGCGTGGTCA TCGTGGCTGT GATTGTGTGC ATCCTGGTCC -1740 

TGGCGGTGCT GGGCGCTGTC CTCTATTTCC TCTATAAGAA GGGCAAGCTG CCGTGCAGGC 1800 

GCTCAGGGAA GCAGGAGATC ACGCTGCCCC CGTCTCGTAA GACCGAACTT GTAGTTGAAG 1860 

TTAAGTCAGA TAAGCTCCCA GAAGAGATGG GCCTCCTGCA GGGCAGCAGC GGTGACAAGA 1920 

GGGCTCCGGG AGACCAGGGA GAGAAATACA TCGATCTGAG GCAT TAG CCC CGAATCACTT 1980 

CAGCTCCCTT CCCTGCCTGG ACCATTCCCA GCTCCCTGCT CACTCTTCTC TCAGCCAAAG 2040 

CCTCCAAAGG GACTAGAGAG AAGCCTCCTG CTCCCCTCAC CTGCACACCC CCTTTCAGAG 2100 

GGCCACTGGG TTAGGACCTG AGGACCTCAC TTGGCCCTGC AAGCCGCTTT TCAGGGACCA 2160 

GTCCACCACC ATCTCCTCCA CGTTGAGTGA AGCTCATCCC AAGCAAGGAG CCCCAGTCTC 2220 

CCGAGCGGGT AGGAGAGTTT CTTGCAGAAC GTGTTTTTTC TTTACACACA TTATGGCTGT 22 80 

AAATACCTGG CTCCTGCCAG CAGCTGAGCT GGGTAGCCTC TCTGAGCTGG TTTCCTGCCC 2340 

CAAAGGCTGG CTTCCACCAT CCAGGTGCAC CACTGAAGTG AGGACACACC GGAGCCAGGC 2400 

GCCTGCTCAT GTTGAAGTGC GCTGTTCACA CCCGCTCCGG AGAGCACCCC AGCGGCATCC 2460 

AGAAGCAGCT GCAGTGTTGC TGCCACCACC CTCCTGCTCG CCTCTTCAAA GTCTCCTGTG 252 0 

ACATTTTTTC TTTGGTCAGA AGCCAGGAAC TGGTGTCATT CCTTAAAAGA TACGTGCCGG 2580 

GGCCAGGTGT GGTGGCTCAC GCCTGTAATC CCAGCACTTT GGGAGGCCGA GGCGGGCGGA 264 0 

TCACAAAGTC AGGACGAGAC CATCCTGGCT AACACGGTGA AACCCTGTCT CTACTAAAAA 2700 

TACAAAAAAA AATTAGCTAG GCGTAGTGGT TGGCACCTAT AGTCCCAGCT ACTCGGAAGG 2760 

CTGAAGCAGG AGAATGGTAT GAATCCAGGA GGTGGAGCTT GCAGTGAGCC GAGACCGTGC 2820 

CACTGCACTC CAGCCTGGGC AACACAGCGA GACTCCGTCT CGAGGAAAAA AAAAGAAAAG 2880 

ACGCGTACCT GCGGTGAGGA AGCTGGGCGC TGTTTTCGAG TTCAGGTGAA TTAGCCTCAA 2940 

TCCCCGTGTT CACTTGCTCC CATAGCCCTC TTGATGGATC ACGTAAAACT GAAAGGCAGC 3000 

GGGGAGCAGA CAAAGATGAG GTCTACACTG TCCTTCATGG GGATTAAAGC TATGGTTATA 3060 

TTAGCACCAA ACTTCTACAA ACCAAGCTCA GGGCCCCAAC CCTAGAAGGG CCCAAATGAG 3120 

AGAATGGTAC TTAGGGATGG AAAACGGGG.C CTGGCTAGAG CTTCGGGTGT GTGTGTCTGT 3180 

CTGTGTGTAT GCATACATAT GTGTGTATAT ATGGTTTTGT CAGGTGTGTA AATTTG CAAA 324 0 

TTGTTTCCTT TATATATGTA TGTATATATA TATATGAAAA TATATATATA TATGAAAAAT 3300 

AAAGCTTAAT TGTCCCAGAA AATCATACAT TGCTTTTTTA TTCTACATGG GTACCACAGG 336 0 

AACCTGGGGG CCTGTGAAAC TACAACCAAA AGGCACACAA AACCGTTTCC AGTTGGCAGC 3420 

AGAGATCAGG GGTTACCTCT GCTTCTGAGC AAATGGCTCA AGCTCTACCA GAGCAGACAG 3480 

CTACCCTACT TTTCAGCAGC AAAACGTCCC GTATGACGCA GCACGAAGGG CCTGGCAGGC 3.V0 
TGTTAGCAGG AGCTATGTCC CTTCCTATCG TTTCCGTCCA CTT 



ACH9 DNA sequence 

Gene name: endothelin- 1 (EDN1) 

Unigene number: Hs.2271 

Probeset Accession #: J05008 

Nucleic Acid Accession #: NM 001955 
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Coding sequence: 337-975 (predicted start/stop codons underlined} 
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GGAGCTGTTT 
AAGTCAGACG 
AGCTCTCCAC 
AGGCGCTGCC 
AGGAACCCGC 
GTTTGAACGG 
TCTCTGCTGT 
AGCGCGGTGG 
CGGTCCAAGC 
CTGGACATCA 
AGGTCCAAGA 
TGCCAATGTG 
CTCAGGGCTG 
TCCAAGCTTG 
AGTTCAGAGG 
TCTTTTCATG 
CGAGCACATT 
TGTGGCCGAC 
TTCCTGACTG 
TCCCCCAACC 
TGGGGATGAC 



ACCCCCACTC 
CGCCTCTGCA 
CACCGCCGCG 
TTTTCTCCCC 
AGCGCTTTGA 
GAGGTTTTTG 
TTGTGGCTTG 
GTGAGAACGG 
GCTGCTCCTG 
TTTGGGTCAA 
GAGCCTTGGA 
CTAGCCAAAA 
AAGACATTAT 
GGAAAAAGTG 
AACACCTAAG 
ATCCCAAGCT 
G GTGA CAGAC 
TCTGCACTCT 
GCAAAGGACC 
ATCTTCACTG 
AATGGACCTC 



TAATAGGGGT 
TCTGCGCCAG 
TGCGCCTGCA 
GTTAAAGGGC 
GGGACCTGAA 
ATCCCTTTTT 
CCAAGGAGCT 
CGGGGAGAAA 
CTCGTCCCTG 
CACTCCCGAG 
GAATTTACTT 
AGACAAGAAG 
GGAGAAAGAC 
TATTTATCAG 
ACAAACCAGG 
GAAAGGCAAG 
TTCGGGGCCT 
CCACCCTGGC 
AGCGTCCTCG 
GCTTCCATCA 
TCAGCAGAAA 



TCAATATAAA 
GCGAACGGGT 
GACGCTCCGC 
ACTTGGGCTG 
GCTGTTTTTC 
TTCAGAATGG 
CCAGAAACAG 
CCCACTCCCA 
ATGGATAAAG 
CACGTTGTTC 
CCCACAAAGG 
TGCTGGAATT 
TGGAATAATC 
CAGTTAGTGA 
TCGGAGACCA 
CCCTCCAGAG 
GTCTGAAGCC 
TGGGATCAGA 
TTCAAAACAT 
GTGGTAACTG 
CACACAGTCA 



AAGCCGGCAG 
CCTGCGCCTC 
TCGCTGCCTT 
AAGGATCGCT 
TTCGTTTTCC 
ATTATTTGCT 
CAGTCTTAGG 
GTCCACCCTG 
AGTGTGTCTA 
CGTATGGACT 
CAACAGACCG 
TTTGCCAAGC 
ATAAGAAAGG 
GAGGAAGAAA 
TGAGAAACAG 
AGCGTTATGT 
ATAGCCTCCA 
GCAGGAGCAT 
TCCAAGAAAG 
CTTTGGTCTC 
CATTCGAATT 



AGAGCTGTCC 
CTGCAGTCCC 
CTCTCCTGGC 
TTGAGATCTG 
TTTGGGTTCA 
CATGATTTTC 
CGCTGAGCTC 
GCGGCTCCGC 
CTTCTGCCAC 
TGGAAGCCCT 
TGAGAATAGA 
AGGAAAAGAA 
AAAAGACTGT 
AATCAGAAGA 
CGTCAAATCA 
GACCCACAAC 
CGGAGAGCCC 
CCTCTGCTGG 
GTTAAGGAGT 
TTCTTTCATC 
C 



ACJ1 DNA sequence 

Gene name: BMX non- receptor tyrosine kinase 
Unigene number: Hs.273 72 

Probeset Accession #: X83107 , 
Nucleic Acid Accession #: NM_001721 

Coding sequence: 34-2061 (predicted start/stop codons underlined) 
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GCAAGCACGG 

CTTCTTCTCA 

CTTTTTGTTT 

AGCAGAAAAG 

GAGCAGACGC 

TATGTCTATG 

ATAAGGGGTA 

TTCCTGTGTT 

GCTAATCTGC 

GTGCTGAAGA 

ACTCTAGCCC 

AGTACCAGTC 

TTCAACATGC 

AAAAGTAGCA 

CACACCACCT 

AACCTGGATG 

CTCAGACAAA 

TACACAGTGT 

CACGTGCATA 

ATTCCAAAGC 

CACCCTGTGT 

TGGGAACTGA 

GTGGTCCAGC 

GGCTCCATGT 

CCCAAGCTGG 

GAATATATAA 

CCTTCCCAGC 

CACCAATTO; 

GTGAAAGTA'x 

GTCGGAACAA 

AGCAGCAAGT 

AAGCAGCCCT 

AGGCTTTACC 

CACGAGCTTC 

CGGGAAAAAG 

CTGGCCAGCA 



AACAAGCTGA 
AAAGATCACA 
TGAC CAAAAC 
GATCCATTGA 
CTGTAGAGAG 
CATCAAATGA 
ACCCCCACCT 
GCCAGCAGAG 
ATACTGCAGT 
TACCTCGGGC 
AATATGACAA 
TAGCGCAATA 
AGTATATTCC 
GCAGCAGTGA 
CAAAGATTTC 
ATTATGACTG 
AGGGAAAAGA 
CCTTATTTAG 
CAAATGCTGA 
TTATTCATTA 
CAACAAAGGC 
AAAGAGAAGA 
TGGGCAAGTG 
CAGAAGATGA 
TTAAATTCTA 
GCAATGGCTG 
TCTTAGAAAT 
TACACCGGGA 
CTGACTTTGG 
AGTTTCCAGT 
CAGACGTATG 
ATGACTTGTA 
GGCCCCACCT 
CAGAAAAGCG 
ACAAGCATTG 



TTTTCATTCA 



GACGGATGAT 
GCAAAAGAAG 
AAACCTTTCC 
AATTAAGAAA 
ACAGTACCCA 
AGAGAGC CGA 
GCTGGTCAAG 
CTGTAAAGCA 
CAATGAAGAG 
AGTTCCTGTT 
CGAATCAAAG 
TGACAGCAAC 
AAGGGAAGAC 
AGATGTTGCA 
ATGGGAATTC 
GT^TGCTGGT 
AGGAGCATTT 
TAAGGCTGTG 
GAACAAATTA 
TCATCAACAC 
CAACAAGGTC 
GATTACCTTG 
GAAGGGGCAG 
ATTCTTTCAG 
TGGAGTGTGT 
CTTGCTGAAT 
GTGCTACGAT 
CTTGGCTGCT 
AATGACAAGG 
CAAGTGGTCA 
GGCATTTGGG 
TGACAACTCC 
GGCATCGGAC 
TCCCACATTT 
_AAGAAGAAAT 
TTTTAAGGAA 



AAT ATGG ATA 
AAAATGTCAC 
TACTATGAAT 
ATCAGATGTG 
TTTCAGATTG 
AGTCAGTGGT 
TACCATAGTG 
GCCCCAGGAT 
AAACACAGAG 
CT CAAAATGG 
AAAAACTATG 
TCAAAGAAAA 
TTCCCTGACT 
AGCAGTAACC 
CCTGAGTCAA 
AACATCTCCA 
ATGGTTAGAA 
AATGATAAAA 
TACCTGGCAG 
AATTCAGCAG 
CCCGACTCTG 
TTGAAGGAGC 
TATGATGTTG 
GAGGCCCAGA 
TCAAAGGAAT 
TACCTGAGGA 
GTCTGTGAAG 
CGTAACTGCT 
TATGTT CTTG 
GCTCCAGAGG 
ATCCTGATGT 
CAGGTGGTTC 
ACCATCTACC 
CAGCAACTCC 
TAGGAGTGCT 
AGTAGGAAGG 



CAAAATCTAT 
CAAATAATTA 
ATGACAAAAT 
TGGAGAAAGT 
TCTATAAAGA 
TGAAAGCATT 
GGTTCTTCGT 
GTACCCTCTG 
TTCCCACCTT 
ATGCACCATC 
GCTCCCAGCC 
TCTATGGCTC 
GGTGGCAAGT 
AAAAAGAAAG 
GTTCATCTGA 
GAT CACAATC 
ATT CGAGCCA 
AAGGAACTGT 
AAAACTACTG 
GCATGATCAC 
TGTCCCTGGG 
TGGGAAGTGG 
CTGTTAAGAT 
CTATGATGAA 
ACCCCATATA 
GTCACGGAAA 
GCATGGCCTT 
TGGTGGACAG 
ATGACCAGTA 
TGTTTCATTA 
GGGAGGTGTT 
TGAAGGTCTC 
AGATCATGTA 
TGTCTTCCAT 
GATAAGAATG 
CATAAGTAAT 



TCTAGAAGAA 
CAAAGAACGG 
GAAAAGGGGC 
AAATCTCGAG 
TGGGCTTCTC 
ACAAAAAGAG 
GGACGGGAAG 
GGAAGCATAT 
CCCAGACAGA 
TTCAAGTACC 
ACCATCTTCA 
CCAGCCAAAC 
AAGAAAACTG 
AAATGTGAAT 
AGAAGAGGAA 
TGAACAGTTA 
AGTGGGAATG 
CAAACATTAC 
TTTTGATTCC 
ACGGCTCCGC 
AAATGGAATC 
CCAGTTTGGA 
GATCAAGGAG 
ACTCAGCCAT 
CATAGTGACT 
AGGACTTGAA 
CTTGGAGAGT 
AGATCTCTGT 
TGTCAGTTCA 
CTTCAAATAC 
CAGCCTGGGG 
CCAGGGCCAC 
CAGCTGCTGG 
TGAAC CACTT 
AATATAGATG 
TTTAGCTAGT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 



60 
120 
180 
240 
300 
360 
" 420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 



115 



TTTTAATAGT GTTCTCTGTA TTGTCTATTA TTTAGAAATG AACAAGGCAG GAAACAAAAG 222 0 

ATTCCCTTGA AATTTAGATC AAATTAGTAA TTTTGTTTTA TGCTGCTCCT GATATAACAC 228 0 

TTTCCAGCCT ATAGCAGAAG CACATTTTCA GACTGCAATA TAGAGACTGT GTTCATGTGT 2340 

AAAGACTGAG CAGAACTGAA AAATTACTTA TTGGATATTC ATTCTTTTCT TTATATTGTC 2400 
5 ATTGTCACAA CAATTAAATA TACTACCAAG TACAGAAATG TGGAAAAAAA AAACCG 



ACJ4 DNA sequence 

Gene name: prostaglandin G/H synthase 2 (COX- 2; PGHS-2) 
10 Unigene number: Hs. 1963 84 

Probeset Accession #: D28235 
Nucleic Acid Accession #: NM_000963 

Coding sequence: 135-1949 (predicted start/stop codons underlined) 



,„. — „ 

%=sJ 



2 Si..; 

CP 

if"!; 

=2-:;?. 



5 

n y 

S3 5 

y * 

r. s 



40 



15 CAATTGTCAT 
CTCCTTCAGC 
TGCCCGCCGC 
ATACAGCAAA 
GATTTGACCA 
20 CACCGGAATT 
ACATACTTAC 
ATGCAATTAT 
ACAATGCTGA 
GAGCCCTTCC 
2 5 AGCTTCCTGA 
ATCCCCAGGG 
TCAAGACAGA 
TAAATCATAT 
GAAAAATGAA 
'3 0 AGGCAGAGAT 
AGGTCTTTGG 
ACAGAGTATG 
AGACAAGCAG 
AACACTTGAG 
AATTCCAGTA 
TTCTGCCTGA 
ACAACTCTAT 
TTG CTGGCAG 
CTT CCATTGA 
TTATGCTGAA 
AGTTGGAAGC 
AAAAGCCTCG 
CCTTGAAAGG 
TTGGTGGAGA 
45 ATAACGTGAA 
CAGTCACCAT 
TACTAAAAGA 
GAACCATGTC 
AACATCTTCT 
50 GACTTTTATG 
TATTCTGTTT 
TTATATTATA 
AAATGCTGAA 
GTAACTAATG 
55 AGGTATCAGT 
ACTTTTTAAA 
ACCTGTAAAA 
GCTGTCTTGG 

tatttt; AA 

60 AAACTTC ,TT 
TCTCAAAATA 
GTAAAATCTA 
CAAATTATTG 
ACTGCAGGCC 
6 5 ATAACGATAT 
TTGCAAAAGT 
CATTAATTTT 
ACCTGCATGC 



ACGACTTGCA 
TCCACAGCCA 
TCGG ATG CTC 
TCCTTGCTGT 
GTATAAGTGC 
TTTGACAAGA 
CCACTTCAAG 
GAGTTATGTC 
CTATGGCTAC 
TCCTGTGCCT 
TTCAAATGAG 
CTCAAACATG 
TCATAAGCGA 
TTACGGTGAA 
ATATCAGATA 
GATCTACCCT 
TCTGGTGCCT 
CGATGTGCTT 
GCTAATACTG 
TGGCTATCAC 
CCAAAATCGT 
CACCTTTCAA 
ATTGCTGGAA 
GGTTG CTGGT 
CCAGAGCAGG 
GCCCTATGAA 
ACTCTATGGT 
GCCAGATGCC 
ACTTATGGGT 
AGTGGGTTTT 
GGGCTGTCCC 
CAATGCAAGT 
ACGTTCGACT 
TATTAATTTA 
GTAACAGAAG 
TCACTACTCT 
TATAAACCAG 
AGAACGAAAG 
AGTTTTTACA 
TTTGAAATTT 
GCATTATTAA 
AT CAGCAATG 
GCTTGTTTGA 
ATTTAAATCT 
GTGATGTTCC 
TTAAATCAAA 
AGAATATTTT 
TAT C AGCAAA 
TTCAAATTTA 
TGGTACTCAG 
GTTTTCTCAG 
AGCAATGACC 
ATCTCAGTCT 
TGTTCCTTTT 



GTGAGCGTCA 
GACGCCCTCA 
GCCCGCGCCC 
TCCCACCCAT 
GATTGTACCC 
ATAAAATTAT 
GGATTTTGGA 
TTGACATCCA 
AAAAGCTGGG 
GATGATTGCC 
ATTGTGGAAA 
ATGTTTGCAT 
GGGCCAGCTT 
ACTCTGGCTA 
ATTGATGGAG 
CCTCAAGTCC 
GGTCTGATGA 
AAACAGGAGC 
ATAGGAGAGA 
TTCAAACTGA 
ATTGCTGCTG 
ATTCATGACC 
CATGGAATTA 
GGTAGGAATG 
CAGATGAAAT 
TCATTTGAAG 
GACATCGATG 
ATCTTTGGTG 
AATGTTATAT 
CAAATCATCA 
TTTACTTCAT 
TCTTCCCGCT 
GAACTGTAGA 
ATTATTTAAT 
TCAGTACTCC 
AAAGATTTTG 
AGAGAAATGA 
TAAAGATGTT 
CTGTCGATGT 
TAAAGTACTT 
ATGAATATTT 
AAACAATAAT 
TTTCTTAAAG 
GTAAAATCAG 
TTTTTCACCA 
ATGCCAAATT 
GTTGAGATAT 
AGGGTCTACC 
GGTTTAAACT 
ATTTTGCTAT 
ATTTTCTGTT 
TCATAAAATA 
TGAAGC CAAT 
CTTTTCTTCT 



GGAGCACGTC 
GACAGCAAAG 
TGCTGCTGTG 
GTCAAAACCG 
GGACAGGATT 
TTCTGAAACC 
ACGTTGTGAA 
GAT CACATTT 
AAGCCTTCTC 
CGACTCCCTT 
AATTGCTTCT 
TCTTTGCCCA 
TCACCAACGG 
GACAGCGTAA 
AGATGTATCC 
CTGAGCATCT 
TGTATGCCAC 
ATCCTGAATG 
CTATTAAGAT 
AATTTGACCC 
AATTTAACAC 
AGAAATACAA 
CCCAGTTTGT 
TTCCACCCGC 
ACCAGTCTTT 
AACTTACAGG 
CTGTGGAGCT 
AAACCATGGT 
GTTCTCCTGC 
ACACTGCCTC 
TCAGTGTTCC 
CCGGACTAGA 
AGTCTAATGA 
AATATTTATA 
TGTTGCGGAG 
CTGTTGCTGT 
GTTTTGACGT 
TGAATACTTA 
TTCCAATGCA 
TTGGTTATTT 
AAATTAGACA 
TTGAAATTTC 
TTATTAAACT 
ATGAAATTTT 
AGAGTATAAA 
TATTAAGGTG 
TCCAGAATTT 
TTTAAAATAA 
TTTGAAGCAA 
GAGGTTAATG 
GTACAGTTTA 
CCTCTTCAAA 
TCAGTAGGTG 
TTTAGCCATT 



CAGGAACTCC 
CCTACCCCCG 
CGCGGTCCTG 
AGGTGTATGT 
CTATGGAGAA 
CACTCCAAAC 
TAACATTCCC 
GATTGACAGT 
TAACCTCTCC 
GGGTGTCAAA 
AAGAAGAAAG 
GCACTTCACG 
GCTGGGCCAT 
ACTGCGCCTT 
TCCCACAGTC 
ACGGTT TGCT 
AATCTGGCTG 
GGGTGATGAG 
TGTGATTGAA 
AGAACTACTT 
CCTCTATCAC 
CTATCAACAG 
TGAATCATTC 
AGTACAGAAA 
TAATGAGTAC 
AGAAAAGGAA 
GTATCCTGCC 
AGAAGTTGGA 
CTACTGGAAG 
AATTCAGTCT 
AGATCCAGAG 
TGATATCAAT 
TCATATTTAT 
TTAAACTCCT 
AAAGG AGT C A 
TAAGTTTGGA 
CTTTTTACTT 
AACACTATCA 
TCTTCCATGA 
TTCTGTCATC 
TTACCAGTAA 
TAAATTCATA 
TGTACATATA 
ACTACAATTG 
CCTTTTTAGT 
GTGGAGC CAC 
GTTTATATGG 
GCAATAACAA 
ACT TT TTTTT 
AAGTACCAAG 
ATTTAGCAGT 
ATGCTTAAAT 
CATTGGAATC 
TTGCTAAGAG 



TCAGCAGCGC 


60 


CGCCGCGCCC 


120 


GCGCTCAGCC 


180 


ATGAGTGTGG 


240 


AACTGCT C AA 


300 


ACAGTGCACT 


360 


TTCCTTCGAA 


420 


CCACCAACTT 


480 


TATTATACTA 


540 


GGTAAAAAGC 


600 


TTCATCCCTG 


660 


CAT CAGTT TT 


720 


GGGGTGGACT 


780 


TTCAAGGATG 


840 


AAAGATACTfc 


. 900 


GTGGGGCAGG 


960 


CGGGAACACA 


1020 


CAGTTGTTCC 


1080 


GATTATGTGC 


1140 


TTCAACAAAC 


1200 


TGGCATCCCC 


1260 


TTTATCTACA 


1320 


ACCAGGCAAA 


1380 


GTATCACAGG 


1440 


CGCAAACGCT 


1500 


ATGTCTGCAG 


1560 


CTTCTGGTAG 


1620 


GCACCATTCT 


1680 


CCAAGCACTT 


1740 


CTCATCTGCA 


1800 


CTCATTAAAA 


1860 


CCCACAGTAC 


1920 


TTATT TATAT 


1980 


TATGTTACTT 


2040 


TACTTGTGAA 


2100 


AAACAGTTTT 


2160 


GAATTTCAAC 


2220 


CAAGATGGCA 


2280 


TGCATTAGAA 


2340 


AAACAAAAAC 


2400 


TTTCATGTCT 


2460 


GGGTAGAATC 


2520 


CCAAAAAGAA 


2580 


CTTGTTAAAA 


2640 


GTGACTGTTA 


2700 


TGCAGTGTTA 


2760 


CTGGTAACAT 


2820 


AGAAGAAAAC 


2880 


ATCCTTGTGC 


2940 


CTGTGCTTGA 


3000 


C CAT ATCAC A 


3060 


TCATTTCACA 


3120 


AAGCCTGGCT 


3180 


ACACAGTCTT 


3240 
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CTCATCACTT CGTTTCTCCT ATTTTGTTTT ACTAGTTTTA AGATCAGAGT TCACTTTCTT 3300 

TGGACTCTGC CTATATTTTC TTACCTGAAC TTTTGCAAGT TTTCAGGTAA ACCTCAGCTC 3360 

AGGACTGCTA TTTAGCTCCT CTTAAGAAGA TTAAAAGAGA AAAAAAAAGG CCCTTTTAAA 3420 

AATAGTATAC ACTTATTTTA AGTGAAAAGC AGAGAATTTT ATTTATAGCT AATTTTAGCT 3480 

ATCTGTAACC AAGATGGATG CAAAGAGGCT AGTGCCTCAG AGAGAACTGT ACGGGGTTTG 3540 

TGACTGGAAA AAGTTACGTT CCCATTCTAA TTAATGCCCT TTCTTATTTA AAAACAAAAC 3600 

CAAATGATAT CTAAGTAGTT CTCAGCAATA ATAATAATGA CGATAATACT TCTTTTCCAC 3660 

ATCTCATTGT CACTGACATT TAATGGTACT GTATATTACT TAATTTATTG AAGATTATTA 3720 

TTTATGTCTT ATTAGGACAC TATGGTTATA AACTGTGTTT AAG CCTACAA TCATTGATTT 3780 

TTTTTTGTTA TGTCACAATC AGTATATTTT CTTTGGGGTT ACCTCTCTGA ATATTATGTA 3 840 

AACAATCCAA AGAAATGATT GTATTAAGAT TTGTGAATAA ATTTTTAGAA ATCTGATTGG 3900 

CATATTGAGA TATTTAAGGT TGAATGTTTG TCCTTAGGAT AGGCCTATGT GCTAGCCCAC 3960 

AAAGAATATT GTCTCATTAG CCTGAATGTG CCATAAGACT GACCTTTTAA AATGTTTTGA 4020 

GGGATCTGTG GATGCTTCGT TAATTTGTTC AGCCACAATT TATTGAGAAA ATATTCTGTG 4 080 

TCAAGCACTG TGGGTTTTAA TATTTTTAAA TCAAACGCTG ATTACAGATA ATAGTATTTA 4140 

TATAAATAAT TGAAAAAAAT TTTCTTTTGG GAAGAGGGAG AAAATGAAAT AAATATCATT 4200 

AAAGATAACT CAGGAGAATC TTCTTTACAA TTT TACGTTT AGAATGTTTA AGGTTAAGAA 4260 

AGAAATAGTC AATATGCTTG TATAAAACAC TGTTCACTGT TTTTTTTAAA AAAAAAACTT 4320 

GATTTGTTAT TAACATTGAT CTGCTGACAA AACCTGGGAA TTTGGGTTGT GTATGCGAAT 43 80 

GTTTCAGTGC CTCAGACAAA TGTGTATTTA ACTTAT GTAA AAGATAAGTC TGGAAATAAA 4440 
TGTCTGTTTA TTTTTGTACT ATTTA 



ACJ6 DNA sequence 

Gene name: SEC14-like-l 

Unigene number: Hs.75232 

Probeset Accession #: D67029 

Nucleic Acid Accession #: NM_003003 

Coding sequence: 304-2451 (predicted start/stop codons underlined} 

CAAGTGCCGT CGCCGCGCCC CTTCCCCCTC CCGCCTCCCC GGCCCCCTCC CCGGAACCGG 60 

CGGTCGAGCT ACGGTCGCGG ACGAGTGGAA CCGAGACTGC CCCGCGGAGC CGCCGGTATG 120 

AGCGCCCCTC GCCACCCCGT GTCCCAGGCC CGGCCTTTCT GACAAGAGCT AGACTTCGGG 180 

CTCCTTGAGG ATATTCAGTT TTGTATGTTT GAATATC CTC TCACCATGTT CAGCATAAAG 240 

TACCATTCTT AATGATTATC CTCAACAAGA CAGGTGTGAG AGGGTTGCTG TTGCATTGCA 300 

ATC ATGG TGC AAAAATACCA GTCCCCAGTG AGAGTGTACA AATACCCCTT TGAATTAATT 360 

ATGGCTGCCT ATGAAAGGAG GTTCCCTACA TGTCCTTTGA TTCCGATGTT CGTGGG CAGT 420 

GACACTGTGA GTGAATTCAA GAGCGAAGAT GGGGCTATTC ATGTCATTGA AAGGCGCTGC 480 

AAGCTGGATG TAGATGCACC CAGACTGCTG AAGAAGATTG CAGGAGTTGA TTATGTTTAT 540 

TTTGTCCAGA AAAACTCACT GAATTCTCGG GAACGTACTT TGCACATTGA GGCT TATAAT 600 

GAAACGTTTT CCAATCGGGT CATCATTAAT GAGCATTGCT GCTACACCGT TCACCCTGAA 660 

AATGAAGATT GGACCTGTTT TGAACAGTCT GCAAGTT TAG ATATTAAATC TTTCTTTGGT 720 

TTTGAAAGTA CAGTGGAAAA AATTGCAATG AAACAATATA C CAGCAAC AT TAAAAAAGGA 78 0 

AAGGAAATCA TCGAATACTA CCTTCGCCAA TTAGAAGAAG AAGGCATAAC CTTTGTGCCC 840 

CGTTGGAGTC CGCCTTCCAT CACGCCCTCT TCAGAGACAT CTTCATCATC CTCCAAGAAA 900 

CAAGCAGCGT CCATGGCCGT CGTCATCCCA GAAGCTGCCC TCAAGGAGGG GCTGAGTGGT 960 

GATGCCCTCA GCAGCCCCAG TGCACCTGAG CCCGTGGTGG GCACCCCTGA CGACAAACTA 1020 

GATGCCGACC ACATCAAGAG ATACCTGGGC GATTTGACTC CGCTGCAGGA GAGCTGCCTC 1080 

ATTAGACTTC GCCAGTGGCT CCAGGAGACC CACAAGGGCA AAATTCCAAA AGATGAGCAT 114 0 

ATTCTTCGGT TCCTCCGTGC ACGGGATTTT AATATTGACA AAGCCAGAGA GATCATGTGT 1200 

CAGTCTTTGA CGTGGAGAAA GCAGCATCAG GTAGACTACA TTCTTGAAAC CTGGACCCCT 1260 

CCTCAGGTCC TTCAGGATTA CTACGCGGGA GGCTGGCATC ATCACGACAA AGATGGGCGG 1320 

CCCCTCTACG TGCTCAGGCT GGGGCAGATG G AC AC C AAAG GCTTGGTGAG AGCGCTCGGG 13 80 

GAGGAAGCCC TGCTGAGATA CGTTCTCTCC GTAAATGAAG AACGG CTAAG GCGATGCGAA 144 0 

GAGAATACAA AAGTCTTTGG TCGGCCTATC AGCTCATGGA CCTGCCTGGT GGACTTGGAA 1500 

GGGCTGAACA TGCGC CACTT GTGGAGACCT GGTGT GAAAG CGCTGCTGCG GATCATCGAG 1560 

GTGGTGGAGG CCAACTACCC TGAGACACTG GGCCGCCTTC TCATCCTGCG GGCGCCCAGG 162 0 

GTATTTCCTG TGCTCTGGAC GCTGGTTAGT CCGTTCATTG ATGACAACAC CAGAAGGAAG 1680 

TTCCTCATTT ATGCAGGAAA TGACTAC * *\G GGTCCTGGAG GCCTGCTGGA TTACATCGAC 174 0 

AAAGAGATTA TTCCAGATTT CCTGAG1 3G GAGTGCATGT GCGAAGTGCC AGAGGGTGGA 1800 

CTGGTCCCCA AATCTCTGTA CCGGACTGCA GAGGAGCTGG AGAACGAAGA CCTGAAGCTC 1860 

TGGACTGAGA CCATCTACCA GTCTGCAAGC GTCTTCAAAG GAGCCCCACA TGAGATTCTC 192 0 

ATTCAGATTG TGGATGCCTC GTCAGTCATC ACTTGGGATT TCGACGTGTG CAAAGGGGAC 198 0 

ATTGTGTTTA AC ATCT AT C A CTCCAAGAGG TCGCCACAAC CACCCAAAAA GGACTCCCTG 204 0 

GGAGCCCACA GCATCACCTC TCCGGGTGGG AACAATGTGC AGCTCATAGA CAAAGTCTGG 2100 

CAGCTGGGCC GCGACTACAG CATGGTGGAG TCGCCTCTGA TCTGCAAAGA AGGAGAAAGC 2160 

GTGCAGGGTT CCCATGTGAC CAGGTGGCCG GGCTTCTACA TCCTGCAGTG GAAATTCCAC 2220 

AGCATGCCTG CGTGCGCCGC CAGCAGCCTT CCCCGGGTGG ACGACGTGCT TGCGTCCCTG 2280 
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CAGGTCTCTT CGCACAAGTG TAAAGTGATG TACTACACCG AGGTGATCGG CTCGGAGGAT 2340 

TTCAGAGGTT CCATGACGAG CCTGGAGTCC AGCCACAGCG GCTTCTCCCA GCTGAGTGCC 2400 

GCCACCACCT CCTCCAGCCA GTCCCACTCC AGCTCCATGA TCTCCAG GTA G TGCCGCGCT 2460 

GCCTGCACCT AGTGTGCAGA GGGGACGGCC GCCCCTCCTC GGACAGCAGC TGCACCCGCC 2520 

CACCCAGCGG CGACATTGTA CAGACTCCTC TCACCTCTAG ATAGCAAATA GCTCTCAGAT 2580 

GGTAAACGTA GTCGTTTGAT CCCAAAACTA CCTTGGCAGG TAGTTTTAAC TCTGATCCTA 2640 

ACTTAACTCA ATAGCCATAG ATTTTGTATA CGTTGTGCAC AAAATCCAAC CAGAGCGCAA 2700 

GGGCTCTCTT GAAAGAAAAG TAGTTT CTGT ACCAATTAAA GGATTGACGT GGTCTCAGAT 2760 

ATTGATGCAA AAAATTTTTC CAACGAACTC CGCATTGTCC ATTAGTGAAT GAATTCCTGT 2820 

GACATCCTCC AGAGATGGCC CCTCCTCACC TGGGACGGAA GCTGCCAGCT CGCTTCCCCC 28 80 

AAGCTGCCTC ATGGCCCGCA CGCCGCCTCA CGGCCCCCAT GCTTCCCGCC AGTCAAGATG 2940 

GTCTGTGGAC TTAGGGCCAG CCCTTGAGGT CCTTATCCTC TGAGGATTCA GAGGTTGCCT 3000 

GCGGAGTACC TTGTCCCAGG GCCAGACACA CCCACACCAC CCACTGTCTG CAGTGGGGCC 3060 

GGGGGCTCAG GAGGGGCTCT CAGGGACTCC TGGTGACTCC AGGAAAATGC TGCCATCGTT 3120 

AAACATTACT TTCTCTTTCC TCCTTTTCAA ATCTTTTTGA TACTTTTTAG AGCAGGATTT 3180 

TTCTGTATGT GAACTTGGGT GGGGGGGTTC TTCCCGTTTC CTTC CGTGCG TCGCCCCTCT 3240 

CACCTGCAGT CAGCTCCCAG CCCAGTGTAG GCCATCTCCT CTGTGCCCTC TGGAGGCTCA 33 00 

TTGTCTCAGA GCCCAGACAG TTCCAGCCAC TAGGAGGCCG TCTTGGAACC AGCAAGTCGC 3360 

ATTTGCCACT TGACACTGTC CATGGGGTTT TATTAGTAGC TAAGCAGCAG CTCTCGCATC 3420 

CACTTCAGGG TGGCGTGTGG CATGTAGGAG TCCTGCTTCT TTGTACATGG GAATTGTGGA 3480 

CTCATGCGTG TGTGTGTGTG CATGTGCTGT GTGTGTGCAT GTGTGCATGA CGGTGGGGGT 354 0 

GCTGGGGGGA CGGGGTGAGT GGAAACTTAG TTTGAGTAAT GAAGGAATCT TCACAGAAGC 3600 

AAATCAGAAT ATGGGATTTG TTTGCCTTTT ACATTTTGTT TAATTCCTGA TTTTAAAGCC 3660 

TGCTCTATCT GGTACAGGCC CTTATTTTTT CAGCTTTTTA TGGGAAAAGC AGGT TATTTG 3720 

AGAATCTGTC CAGAAGTTGC ATAGGGGATG GCCTCCACGA TAAGGACATG CAACACGTGT 378 0 

TTCTGTGTGC AGCAGAGGCC GTGTTTTTCA TGCCAAACCC CACGCGGCTG TCAACTGTGT 3 84 0 

GCGTGGTAGG CATGGAGATC CTGGTTGTGC CGTCTCAGCT CCGCTCTGAA GGCACTGTGT 3 900 

GGGTGCTGCG TGACTGGAGA GCTGTGTGGA GGC C ATGTGT GCCCCGTGCA GGGATCAGGA 3 960 

GGGCGGGGGA GGGACCGAGC AGCCCTCTTG CCCGGTCGGG TCAGCCCTAG TGGCTGCCTG 4 020 

CACACTGTAG ACGTCCCAGG GCCTGTGCTG TGATCACCTG CCTTTGGACC ACATTTGTGT '4080 

TTGCTCTTAG AGATCGAGCT CCTCAGTGGT ACCTGAAGCC TTTGCTTCCG GAAAGCGCGG 4140 

TAGGGT TCGT AGGTAGGGCT AGTAGGTAGG GTTAGTAGGT AGGGGTAGTA GGTAGGGCTA 4200 

GTAGGTAGGG TTAGTAGGTA GGGTTCGTAG GTAGGGCTGG TAGGTAGGGT TAGTAGGTAG 4260 

GGCTAGTAGG TAGGGTTCGT AGGTAGGGCT AGTAGGTAGG GTTAGTAGGT AGGGCTAGTA 4320 

GGTAGGGCTA GTAGGTAGGG TTAGTAGGTA GGGTTCGTAG GTAGGGCTGG TAGGTAGGGT 4380 

TAGTAGGTAG GGCTAGTAGG TAGGGTTCGT AGGTAGGGCT AGTAGGTAGG GTTAGTAGGT 4440 

AGGGCTAGTA GGTAGGGCTA GTAGGTAGGG TTAGTAGGTA GGGTTCGTAG GTAGGGCTGG 4500 

TAGGTAGGGT TAGTAGGTAG GGCTAGTAGG TAGGGCTAGT AGGTAGGGCT AGTAGGTAGG 4560 

GTTAGTAGGT AGGGCTAGTA GGTAGGGCTA GTAGGTAGGG TTAGTAGGTA GGGTTCGTAG 4620 

GTAGGGCTGG TAGGTAGGGT TAGTAGGTAG GGCTAGTAGG TAGGGCTAGT AGGTAGGGCT 4680 

AGTAGGTAGG GCTAGTAGGT AGGGCTAGTA GGTAGGGCTA GTAGGTAGGG CTAGTAGGTA 4740 

GGGTTCGTAG GTAGGGTTCG TAGGTAGGGT TCGTAGGTAG GGTTAGTAGC GCGTCTGTGC 4800 

TGCTTCCACC TGGTGCTTCC TGTTCCCAAA TCACAAGGGC CTGAAGGTGG TCCCTGCTTT 4 860 

CTCTTTCTCT TTCTCTGTGT CTCAGATGGC GATTTTGCTG ACAGCTGCCA AGAAAATGCT 4920 

TCACTCAACA GTCCTCATGT GCCCAGAGAT GTTTATAGAA CTGTTTGAAT TGCAGCCATC 4980 

CCCTGCCCCC TCCCAGGCTG AAGATCTGTT CTTTTTAAGT TGATTCGGGA GTGGCATTCT 504 0 

TTTATACCCA AAGACTGTAG TGCATCTTGA AGAGCTCAAA GCACATGACC GCACAAATGC 5100 

TTACAGGGTT TCCTCCCGAG TAATCCAATC TCACTCCCCT TGTAAGGGAA TTCTGGGGCA 5160 

GCTATGGTTT GAGTATGCAG TTTGCATCGT GTTTCTACCT TTAGTACCTT GCCACTCTTT 5220 

TAAAACGCTG CTGTCATTTC CCATTTCTTA GTACTAATGA TTCTTTGATT CTCCCTCTAT 528 0 

TATGTCTTAA TTCACTTTCC TTCCTAAATT TGTTATTTGC ATATCAAATT CTGTAAATGT 5340 

. TTTGTAAACA TATTACCTCA CTTGGTAATA CAATACTGAT AGTCTTTAAA AGATTTTTTT 5400 
ATTGTT AT C A ATAATAAATG TGAACTATTT AAAG 



AC J 8 DNA sequence 

Gene name: intercellular adhesion molecule 1 (ICAM1; CD54) 
Unigene number : Hs . 1 6 8 3 8 3 
Probeset Accession #: M24283 
Nucleic Acid Accession #: NM_000201 

Coding sequence: 58-1656 (predicted start/stop codons underlined} 

GCGCCCCAGT CGACGCTGAG CTCCTCTGCT ACTCAGAGTT GCAACCTCAG CCTCGCTATG 60 

GCTCCCAGCA GCCCCCGGCC CGCGCTGCCC GCACTCCTGG TCCTGCTCGG GGCTCTGTTC 120 

CCAGGACCTG GCAATGCCCA GACATCTGTG TCCCCCTCAA AAGT CAT C CT GCCCCGGGGA 18 0 

GGCTCCGTGC TGGTGACATG CAGCACCTCC TGTGACCAGC CCAAGTTGTT GGGCATAGAG 24 0 

ACCCCGTTGC CTAAAAAGGA GTTGCTCCTG C CTGGGAACA ACCGGAAGGT GTATGAACTG 300 

AGCAATGTGC AAGAAGATAG CCAACCAATG TGCTATTCAA ACTGCCCTGA TGGGCAGTCA 3 60 
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ACAGCTAAAA 

CCCTCTTGGC 

CCCCGGGCCA 

GCTGTGGGGG 

GCCAATTTCT 

AACACCTCGG 

GTCAGCCCCC 

CTGTTCCCAG 

ACAGTCACCT 

GACGAGGGCA 

CTGCAGACAG 

GTCTCAGAAG 

CTGAATGGGG 

CCAGAGGACA 

ATACACAAGA 

GATTGTCCGG 

TGGGGGAACC 

GGGGAATCAG 

ACTCAAGGGG 

GT CAT CATC A 

CTCTATAACC 

CCCATGAAAC 

CCTCGGCCTT 

CATGCAGCTA 

TACAACAGCA 

CTGTAGTCAC 

TTAAAGTCTA 

ATACAACTGG 

CAGAAGAAGT 

GACGGATGCC 

TTCATTTGTT 

GGTCTCTGGC 

TACAGGTTGT 

CATTGGCCAA 

ATGGACTGGT 

CCCCCAAAAC 

CACAATGACA 

GCCTTGTCCT 

TCCTGCAGTG 

CCTCCCAGCT 

CGCTCTGTCA 

TTTGGGCTCA 

ACACCACACC 

TGCCCAGACT 



CCTTCCTCAC 

AGCCAGTGGG 

ACCTCACCGT 

AGCCCGCTGA 

CGTGCCGCAC 

CCCCCTACCA 

GGGTCCTAGA 

TCTCGGAGGC 

ATGGCAACGA 

CCCAGCGGCT 

TGACCATCTA 

GGACCGAGGT 

TTCCAGCCCA 

ACGGGCGCAG 

ACCAGACCCG 

GAAACTGGAC 

CATTGCCCGA 

TGACTGTCAC 

AGGTCACCCG 

CTGTGGTAGC 

GCCAGCGGAA 

CGAACACACA 

CCCATATTGG 

CACCTACCGG 

TTTGGGGCCA 

ATGACTAAGC 

GCCTGATGAG 

GAAATACTGA 

GGCCCTCCAT 

AGCTTGGGCA 

ATTTTACCAG 

CTCACGGAGC 

ACACTGCAGG 

CCTGCCTTTC 

AATGGTTCAC 

TGACACCTTT 

CTCAGCGGTC 

CTTGTCCTGT 

ATCAGGGTCC 

TTGGAAGGGT 

CCCAGGCTGG 

AGTGATCCTC 

TGGCAAATTT 

TCCTTTGTGT 



CGTGTACTGG 

CAAGAACCTT 

GGTGCTGCTC 

GGTCACGACC 

TGAACTGGAC 

GCTCCAGACC 

GGTGGACACG 

CCAGGTCCAC 

CTCCTTCTCG 

GACGTGTGCA 

CAGCTTTCCG 

GACAGTGAAG 

GCCACTGGGC 

CTTCTCCTGC 

GGAGCTTCGT 

GTGGCCAGAA 

GCTCAAGTGT 

TCGAGATCTT 

CGAGGTGACC 

AGCCGCAGTC 

GATCAAGAAA 

AGCCACGCCT 

TGGCAGTGGT 

CCCTGGGACG 

TGGTACCTGC 

CAAGAGGAAG 

AGGGGAAGTG 

AACTTGCTGC 

AGACATGTGT 

CTGCTGTCTA 

CTATTTATTG 

TCCCAGTCCA 

AGAGTGCCTG 

CCCAGAAGGA 

AGGTTCAGAG 

GTTAGCCACC 

ATGTCTGGAC 

TTGCATTTCA 

TGCAAGCAGT 

CATCCGCGTG 

AGTGCAGTGG 

CCACCTCAGC 

GATTTTTTTT 

TAGTTAATAA 



ACTCCAGAAC 

ACCCTACGCT 

CGTGGGGAGA 

ACGGTGCTGG 

CTGCGGCCCC 

TTTGTCCTGC 

CAGGGGACCG 

CTGGCACTGG 

GCCAAGGCCT 

GTAATACTGG 

GCGCCCAACG 

TGTGAGGCCC 

CCGAGGGCCC 

TCTGCAACCC 

GTCCTGTATG 

AATTCCCAGC 

CTAAAGGATG 

GAGGGCACCT 

GTGAATGTGC 

ATAATGGGCA 

TACAGACTAC 

CCCTGAACCT 

GCCACACTGA 

CCGGAGGACA 

ACACCTAAAA 

GAGCAAGACT 

GTGGGGGAGA 

CTATTGGGTA 

AG CAT CAAAA 

CTGACCCCAA 

AGTGTCTTTT 

TGTCACATT C 

GCAAAAAGAT 

GTGATTTTTC 

ATTACCCAGT 

TCCCCACCCA 

ATGAGTGCCC 

CTGGGAGCTT 

GGGGAAGGGG 

TGTGTGTGTG 

TGCAATCATG 

CTCCTGAGTA 

TTTTTTTTCA 

AGCTTTCTCA 



GGGTGGAACT 

GCCAGGTGGA 

AGGAGCTGAA 

TGAGGAGAGA 

AAGGGCTGGA 

CAGCGACTCC 

TGGTCTGTTC 

GGGACCAGAG 

CAGTCAGTGT 

GGAACCAGAG 

TGATTCTGAC 

ACCCTAGAGC 

AGCTCCTGCT 

TGGAGGTGGC 

GCCCCCGACT 

AGACTCCAAT 

GCACTTTCCC 

ACCTCTGTCG 

TCTCCCCCCG 

CTGCAGGCCT 

AACAGGCCCA 

ATCCCGGGAC 

ACAGAGTGGA 

GGGCATTGTC 

CACTAGGCCA 

CAAGACATGA 

CATAGCCCCA 

TGCTGAGGCC 

CACAAAGGCC 

CCCTTGATGA 

ATGTAGGCTA 

AAGGTCACCA 

CAAATGGGGC 

TATCGGCACA 

GAGGCCTTAT 

CATACATTTC 

AGGGAATATG 

GCACTATTGC 

GCCAAGGTAT 

TGTATGTGTA 

GTTCACTGCA 

GCTGGGACCA 

GAGACGGGGT 

ACTGCC 



GGCACCCCTC 420 

GGGTGGGGCA 480 

ACGGGAGCCA 540 

TCACCATGGA 600 

GCTGTTTGAG 660 

CCCACAACTT 720 

CCTGGACGGG 780 

GTTGAACCCC 840 

GACCGCAGAG 900 

CCAGGAGACA 960 

GAAGCCAGAG 1020 

CAAGGTGACG 1080 

GAAGGCCACC 1140 

CGGCCAGCTT 1200 

GGACGAGAGG 1260 

GTGCCAGGCT 1320 

ACTGCCCATC 13 80 

GGCCAGGAGC 1440 

GTATGAGATT 1500 

CAG CACGTAC 1560 

AAAAGGGACC 1620 

AGGGCCTCTT 1680 

AGACATATGC 1740 

CTCAGT CAGA 1800 

CGCATCTGAT 1860 

TTGATGGATG 1920 

CCATGAGGAC 1980 

CACAGACTTA 204 0 

CACACTTCCT 2100 

TATGTATTTA '2160 

AATGAACATA 2220 

GGTACAGTTG 2280 

TGGGACTTCT 2340 

AAAGCACTAT 2400 

TCCTCCCTTC 24 60 

TGCCAGTGTT 2520 

CCCAAGCTAT 2580 

AGCTCCAGTT 264 0 

TGGAGGACTC 2700 

GACAAGCTCT 2760 

GTCTTGACCT 2820 

TAGGCTCACA 28 80 

CTCGCAACAT 2940 
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ACK3 DMA sequence 

Gene name: angiopoietin,. 1 receptor (TIE-2; TEK) 
Unigene number: Hs.89640 
Probeset Accession #: L06139 
Nucleic Acid Accession #: NM_000459 

Coding sequence: 149-3523 (predicted start/stop codons underlined) 



CTTCTGTGCT 
TGGAAAGTCA 
GAAACTGGAT 
AGTCAGCTTG 
CCTACCTCTT 
CCATGAGCCC 
GCTGGAAGTT 
AAAGGCTAGT 
CAGGATACGA 
GACTGTGGAC 
AGATGCAGTG 
TG AT ATT CT A 
CAGGTATATA 
TGAAGCCCAG 
TGTCTGCCAT 



GTTCCTTCTT 
CAAACCGCTG 
GGAGAGATTT 
CTCCTTTCTG 
GTATCTGATG 
ATCACCATAG 
ACTCAAGATG 
AAGATCAATG 
ACCATGAAGA 
AAGGGAGATA 
ATTTACAAAA 
GAAGTACACC 
GGAGGAAACC 
AAGTGGGGAC 
GAAGATACTG 



GCCTCTAACT 
GGTTTTTGAA 
GGGGAAGCAT 



GAACTGTGGA 
CTGAAACATC 
GAAGGGACTT 
TGACCAGAGA 
GTGCTTATTT 
TGCGTCAACA 
ACGTGAACAT 
ATGGTTCCTT 
TGCCTCATGC 
TCTTCACCTC 
CT GAATGCAA 
GAGAATGCAT 



TGTAAACAAG 
AGGATCCTTG 
JGGACTCTTTA 
AGGTGCCATG 
TCTCACCTGC 
TGAAGCCTTA 
ATGGGCTAAA 
CTGTGAAGGG 
AGCTTCCTTC 
ATCTTTCAAA 
CAT C C ATT C A 
TCAGCCCCAG 
GGCCTTCACC 
CCATCTCTGT 
TTGCCCTCCT 



ACGTACTAGG 
GGACCTCATG 
GCCAGCTTAG 
GACTTGATCT 
ATTGCCTCTG 
ATGAACCAGC 
AAAGTTGTTT 
CGAGTTCGAG 
CTACCAGCTA 
AAGGTATTGA 
GTGCCCCGGC 
GATGCTGGAG 
AGGCTGATAG 
ACTGCTTGTA 
GGGTTTATGG 



ACGATGCTAA 


60 


CACATTTGTG 


120 


TTCTCTGTGG 


180 


TGATCAATTC 


240 


GGTGGCGCCC 


300 


ACCAGGATCC 


360 


GGAAGAGAGA 


420 


GAGAGGCAAT 


480 


CTTTAAC TAT 


540 


TTAAAGAAGA 


600 


ATGAAGTACC 


660 


TGTACTCGGC 


720 


TCCGGAGATG 


780 


TGAACAATGG 


840 


GAAGGACGTG 


900 
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TGAGAAGGCT TGTGAACTGC ACACGTTTGG CAGAACTTGT AAAGAAAGGT GCAGTGGACA 960 
AGAGGGATGC AAGT CTTATG TGTTCTGTCT CCCTGACCCC TATGGGTGTT CCTGTGCCAC 1020 
AGGCTGGAAG GGTCTGCAGT GCAATGAAGC ATGCCACCCT GGTTTTTACG GGCCAGATTG 1080 
TAAGCTTAGG TGCAGCTGCA ACAATGGGGA GATGTGTGAT CGCTTCCAAG GATGTCTCTG 1140 
CTCTCCAGGA TGGCAGGGGC TCCAGTGTGA GAGAGAAGGC ATACCGAGGA TGACCCCAAA 1200 
GATAGTGGAT TTGCCAGATC ATATAGAAGT AAACAGTGGT AAATTTAATC CCATTTGCAA 1260 
AGCTTCTGGC TGGCCGCTAC CTACTAATGA AGAAATGACC CTGGTGAAGC CGGATGGGAC 1320 
AGTGCTCCAT CCAAAAGACT TTAACCATAC GGATCATTTC TCAGTAGCCA TATTCACCAT 1380 
CCACCGGATC CTCCCCCCTG ACTCAGGAGT TTGGGTCTGC AGTGTGAACA CAGTGGCTGG 144 0 
GATGGTGGAA AAGCCCTTCA ACATTTCTGT TAAAGTTCTT CCAAAGCCCC TGAATGCCCC 150 0 
AAACGTGATT GACACTGGAC ATAACTTTGC TGTCATCAAC ATCAGCTCTG AGCCTTACTT 1560 
TGGGGATGGA CCAATCAAAT CCAAGAAGCT TCTATACAAA CCCGTTAATC ACTATGAGGC 1620 
TTGGCAACAT ATT C AAGTGA CAAATGAGAT TGTT AC ACT C AACTATTTGG AACCTCGGAC 1680 
AGAATATGAA CTCTGTGTGC AACTGGTCCG TCGTGGAGAG GGTGGGGAAG GGCATCCTGG 1740 
ACCTGTGAGA CGCTTCACAA CAGCTTCTAT CGGACTCCCT CCT C C AAGAG GTCTAAATCT 1800 
C CTGCC T AAA AGTCAGACCA CTCTAAATTT GACCTGGCAA CCAATATTTC CAAGCTCGGA 1860 
AGATGACTTT TATGTTGAAG TGGAGAGAAG GTCTGTGCAA AAAAGTGATC AGCAGAATAT 1920 
TAAAGTTCCA GGCAACTTGA CTTCGGTGCT ACTTAACAAC TTACATCCCA GGGAGCAGTA 1980 
CGTGGTCCGA GCTAGAGTCA ACACCAAGGC CCAGGGGGAA TGGAGTGAAG ATCTCACTGC 204 0 
TTGGACCCTT AGTGACATTC TTCCTCCTCA ACCAGAAAAC ATCAAGATTT CCAACATTAC 2100 
ACACTCCTCG GCTGTGATTT CTTGGACAAT ATTGGATGGC TATTCTATTT CTTCTATTAC 2160 
TATCCGTTAC AAGGTTCAAG GCAAGAATGA AGACCAGCAC GTTGATGTGA AGATAAAGAA 2220 
TGCCACCATC ATT CAGT AT C AGCTCAAGGG CCTAGAGCCT GAAACAGCAT ACCAGGTGGA 2280 
CATTTTTGCA GAGAACAACA TAGGGTCAAG CAACCCAGCC TTTTCTCATG AACTGGTGAC 234 0 
CCTCCCAGAA TCTCAAGCAC CAGCGGACCT CGGAGGGGGG AAGATGCTGC TTATAGCCAT 2400 
CCTTGGCTCT GCTGGAATGA CCTGCCTGAC TGTGCTGTTG GCCTTTCTGA TCATATTGCA 24 60 
ATTGAAGAGG GCAAATGTGC AAAGGAGAAT GGCCCAAGCC TTCCAAAACG TGAGGGAAGA 2520 
ACCAGCTGTG CAGTTCAACT CAGGGACTCT GGCCCTAAAC AGGAAGGTCA AAAACAACCC 2580 
AGATCCTACA ATTTATCCAG TGCTTGACTG GAATGACATC AAATTTCAAG ATGTGATTGG ,2640 
GGAGGGCAAT TTTGGCCAAG TTCTTAAGGC GCGCATCAAG AAGGATGGGT TACGGATGGA 2700 
TGCTGCCATC AAAAGAATGA AAGAATATGC CTCCAAAGAT GATCACAGGG ACTTTGCAGG 2760 
AGAACTGGAA GTTCTTTGTA AACTTGGACA CCATCCAAAC AT CATCAAT C TCTTAGGAGC 2820 
ATGTGAACAT CGAGGCTACT TGTACCTGGC CATTGAGTAC GCGCCCCATG GAAACCTTCT 2880 
GGACTTCCTT CGCAAGAGCC GTGTGCTGGA GACGGACCCA GCATTTGCCA TTGCCAATAG 2940 
CACCGCGTCC ACACTGTCCT CCCAGCAGCT CCTTCACTTC GCTGCCGACG TGGC CCGGGG 3000 
CATGGACTAC TTGAGCCAAA AACAGTTTAT CCACAGGGAT CTGGCTGCCA GAAACATTTT 3060 
AGTTGGTGAA AACTATGTGG CAAAAATAGC AGATTTTGGA TTGTCCCGAG GTCAAGAGGT 3120 
GTACGTGAAA AAGACAATGG GAAGGCTCCC AGTGCGCTGG ATGGCCATCG AGTCACTGAA 3180 
TTACAGTGTG TACACAACCA ACAGTGATGT ATGGTCCTAT GGTGTGTTAC TATGGGAGAT 3240 
TGTTAGCTTA GGAGGCACAC CCTACTGCGG GATGACTTGT GCAGAACTCT ACGAGAAGCT 3300 
GCCCCAGGGC TACAGACTGG AGAAGCCCCT GAACTGTGAT GATGAGGTGT ATGATCTAAT 3360 
GAGACAATGC TGGCGGGAGA AGCCTTATGA GAGGCCATCA TTTGCCCAGA TATTGGTGTC 3420 
CTTAAACAGA ATGTTAGAGG AGCGAAAGAC CTACGTGAAT ACCACGCTTT ATGAGAAGTT 3480 
TACTTATGCA GGAATTGACT GTTCTGCTGA AGAAGCGGCC TAGGACAGAA CATCTGTATA 3540 
CCCTCTGTTT CCCTTTCACT GGCATGGGAG ACCCTTGACA ACTGCTGAGA AAACATGCCT 3600 
CTGCCAAAGG ATGTGATATA TAAGTGTACA TATGTGCTGG AAT TCTAAC A AGTCATAGGT 3660 
TAATATTTAA GACACTGAAA AATCTAAGTG AT AT AAAT C A GATTCTTCTC TCTCATTTTA 372 0 
TCCCTCACCT GTAGCATGCC AGTCCCGTTT CATTTAGTCA TGTGACCACT CTGTCTTGTG 3780 
TTTCCACAGC CTGCAAGTTC AGTCCAGGAT GCTAACATCT AAAAATAGAC TTAAATCTCA 3 840 
TTG CTTACAA GCCTAAGAAT CTTTAGAGAA GTATACATAA GTTTAGGATA AAATAATGGG 3900 
ATTTTCTTTT CTTTTCTCTG GTAATATTGA CTTGTATATT TTAAGAAATA ACAGAAAGCC 3960 
TGGGTGACAT T TGGGAGAC A TGTGACATTT ATATATTGAA TTAAT AT CCC TACATGTATT 4020 
GCACATTGTA AAAAGTTTTA GTTTTGATGA GTTGTGAGTT TACCTTGTAT ACTGTAGGCA 4080 
CACTTTGCAC TGATATATCA TGAGTGAATA AATGTCTTGC CTACTCAAAA AAAAAAAA 



PZA6 DNA sequence 

Gene name: prostate differentiation factor ( PLAB ; MIC-1) 

Unigene number: Hs. 116577 i 

Probeset Accession #: AB000584 

Nucleic Acid Accession #: NM_004864 

Coding sequence: 26-952 (predicted start/stop codons underlined) 

CGGAACGAGG GCAACCTGCA CAGCCATGCC CGGGCAAGAA CTCAGGACGG TGAATGGCTC 60 

TCAGATGCTC CTGGTGTTGC TGGTGCTCTC GTGGCTGCCG CATGGGGGCG CCCTGTCTCT 120 

GGCCGAGGCG AGCCGCGCAA GTTTCCCGGG ACCCTCAGAG TTGCACTCCG AAG ACT CC AG 180 

ATTCCGAGAG TTGCGGAAAC GCTACGAGGA CCTGCTAACC AGGCTGCGGG CCAACCAGAG 24 0 

CTGGGAAGAT TCGAACACCG ACCTCGTCCC GGCCCCTGCA GTCCGGATAC TCACGCCAGA 300 
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AGTGCGGCTG GGATCCGGCG GCCACCTGCA CCTGCGTATC TCTCGGGCCG CCCTTCCCGA 360 

GGGGCTCCCC GAGGCCTCCC GCCTTCACCG GGCTCTGTTC CGGCTGTCCC CGACGGCGTC 42 0 

AAGGTCGTGG GACGTGACAC GACCGCTGCG GCGTCAGCTC AGCCTTGCAA GACCCCAAGC 480 

GCCCGCGCTG CACCTGCGAC TGTCGCCGCC GCCGTCGCAG TCGGACCAAC TGCTGGCAGA 540 

ATCTTCGTCC GCACGGCCCC AGCTGGAGTT GCACTTGCGG CCGCAAGCCG CCAGGGGGCG 600 

CCGCAGAGCG CGTGCGCGCA ACGGGGACGA CTGTCCGCTC GGGCCCGGGC GTTGCTGCCG 660 

TCTGCACACG GTCCGCGCGT CGCTGGAAGA CCTGGGCTGG GCCGATTGGG TGCTGTCGCC 720 

ACGGGAGGTG CAAGTGACCA TGTGCATCGG CGCGTGCCCG AGCCAGTTCC GGGCGGCAAA 780 

CATGCACGCG CAGATCAAGA CGAGCCTGCA CCGCCTGAAG CCCGACACGG AGCCAGCGCC 840 

CTGCTGCGTG CCCGCCAGCT ACAATCCCAT GGTGCTCATT CAAAAGACCG ACAC CGGGGT 900 

GTCGCTCCAG ACCTATGATG ACTTGTTAGC CAAAGACTGC CACTGCATAT_GAGCAGTCCT 960 

GGTCCTTCCA CTGTGCACCT GCGCGGGGGA GGCGACCTCA GTTGTCCTGC CCTGTGGAAT 102 0 

GGGCTCAAGG TTCCTGAGAC ACCCGATTCC TGCCCAAACA GCTGTATTTA TATAAGTCTG 1080 

TTATTTATTA TTAATTTATT GGGGTGACCT TCTTGGGGAC TCGGGGGCTG GTCTGATGGA 1140 

ACTGTGTATT TATTTAAAAC TCTGGTGATA AAAATAAAGC TGTCTGAACT GTTAAAAAAA 1200 
AAAA 



AAC8 DNA sequence 
Gene name : none 
Unigene number: Hs . 66 82 
Probeset Accession #: AA227926 
Nucleic Acid Accession #: none 

Coding sequence: no ORF identified, possible frameshifts 

AAGCTGCAGT TAGCCAAGAT CGCATCATTG CACTCCAGCC TAGGGGACAA GAGCGCGAGA 60 

CTTCATCTCA AAGATTTTTA AATAATAGCT AAAGGTATGC TCTCTAGGTC ATCCTTAGTT 120 

TATTAGTACT GT ACT T AAAA ATTATTTTTT TAATAGTCAA TTTTGGGAGA TAATTATTTC 180 
TTTCCTTATA TTTTCCAATT AGTTGGTGTC TAAAAATAAA TGTTTTGTCT AATTTTAGAT t 240 

CAGGTATACA TTCACAAAAG CATAAATCAT AGTCTCACAG GAAATTCACC AATTTTCCAT 300 

ATGTCGTGAG ATAACTGTCC TTTCTACAAC CT C AT AACAA TGAATTTATA TAATTACCTA 360 

GATTTTCTTA GTGTGAATCT ACC CATTAGT TTTATTTTCT TGGTAGTTAT TTTTTTCCCT 420 

CCTCTCTGTT ACTATTGGCC TTAAAATACA CAGGAGGACG GTTACAGTGT CCTAATAGCT 4 80 

GTTACATGTG TGTGTTTCAG CGTACTTGAA TCAAGTGTAC ATTTATAGTA CCAATAACCG 540 

CCTTTACAGC TTTACAGTTA ACAATTCTCT CACAAAACTG TAGAGCATTA GGCATCTGAG 600 

AGCCATAGAG GGCCAACTTT GTTCCAGAGT GAACATGCTT TTTTTCCTCA ACATATACAC 660 

TACTGATTTT TTTTAAAAGT ATGAC TTTCA AGTGAATTAA TGTATTGGTT AGGAGAACTG 720 

CTTGCTAAGT CCTTATTACC TCTTGTTAAA GCCTCAGAAG GCCGTGCTGA AAGCCAGAGG 780 

GGAAAAAAAG AGTAATGCAC AGGTATCTCT TTTGCAGTGG TGACTGTATT TTGAGTACCT 840 

TGTGTGACAG GGTATTATTA CAGCATCTTG TGGGAAAACC TATTAGGCCT TTGCATGTTA 900 

AAGCTGTATA ATTTGTTGGG TTGTGAGTGG TCTGACTTAA ATGTGTATTA TAAAATTTAG 960 

ACATCAAATT TTCCTACTAA CTAACTTTAT TAGATGCATA CTTGGAAGCA CAGTCAT AT C 1020 

ACACTGGGAG GCAATGCAAT GTGGTTACCT GGTCCTAGGT TTGAACTGTC TTATTTCAAA 1080 

AGATTTCTGA ATTAATTTTT CCCTAGAATT TCTCCTTCAT TCCAAAGTAC AAACATACTT 1140 

TGAAGAATGA AACAGATTGT TCCCATGAAT GTATGCTCAT ACTCGACTAG AAACGATCTA 1200 

TGTTAAATGA C TGTGTAT AT GAATTATTTC AAGTACTACC CCAAATAACT TTCTTATTGC 1260 

TCTGAAAGAA GAAAAGCAAT GTAAATCACT ATGATTATTG CACAAACAAC CAGAATTCTC 1320 

CAACAATTTT AAGTAATCTG ATCCTCTTCT TGGAGAAAAT TGTTACCTAA TAGTTTTTCC 1380 

TTATGAATGT TATTACTACT GGTATAAATC AAATTTCTAT AAATTTCCTA CTTAAAGT CT 1440 

TAARAACTGG GTTCTTCCTT TGATGTTATT CATGTTCAGA AAGGGAAACA ACACTTTACT 1500 

TTTTTAGGGA CAATTTCTAG AATCTATAGT AGTATCAGGA TATATTTTGC TTTAAAATAT 1560 

ATTTTGGTTA TTT TGAATAC AGACATTGGC TCCAAATTTT CATCTTTGCA CAATAGTATG 1620 

ACTTTTCACT AGAACTTCTC AACATTTGGG AACTTTGCAA ATATGAGCAT CATATGTGTT 1680 

AAGGCTGTAT CATTTAATGC TATGAGATAC ATTGTTTTCT CCCTATGCCA AACAGGTGAA 174 0 

CAAACGTAGT TGTTTTTTAC TGATACTAAA TGTTGGCTAC CTGT GATTTT ATAGTATGCA 1800 

CATGTCAGAA AAAGGCAAGA CAAATGGCCT CTTGTACTGA ATACTTCGGC AAACTTATTG 1860 

GGGTCTTCAT TTTCTGACAG ACAGGATTTG ACTCAATATT TGTAGAGCTT GCGTAGGAAT 1920 

GGGATTACAT GGGTAGTGAT GCACTGGTAG GAAATGGTTT TTAGTTATTG ACT CAGG AAT 1980 

TCATCT* \GG ATGAATCTTT TATGTCTTTT TATTGTAAGG CATATCTGGA ATTTACTTTA 2040 

TAAAGG r ^ sGG GTTTAGGAAA GCTTTGTCCT AAAAATTGGG CCCCGGGGAT GGGAACTT C A 2100 

TTTTCAGTTG C CAAGGGGT A GAAAAATAAT ATGTGTGTTG TTATGTTTAT GT T AACAT AT 2160 

TATTAGGTAC TATCTATGAA TGTATTTAAA TATTTTTCAT ATT CTGTGAC AAG CATTT AT 2220 

AATTTGCAAC AAGTGGAGTC C ATT TAGCCC AGTGGGAAAG TCTTGGAACT CAGGTTACCC 2280 

TTGAAGGATA TGCTGGCAGC CATCTCTTTG ATCTGTGCTT AAACTGTAAT TTATAGACCA 2340 

GCTAAATCCC TAACTTGGAT CTGGAATGCA TTAGTTATGA CCTTGTACCA TTCCCAGAAT 2400 

TTCAGGGGCA TCGTGGGTTT GGTCTAGTGA TTGAAAACAC AAGAACAGAG AGATCCAGCT 2460 

GAAAAAGAGT GATCCTCAAT ATCCTAACTA ACTGGTCCTC AACTCAAGCA GAGTTTCTTC 2520 

ACTCTGGCAC TGTGATCATG AAACTTAGTA GAGGGGATTG TGTGTATTTT ATACAAATTT 2580 
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AATACAATGT CTTACATTGA TAAAATTCTT AAAGAGCAAA ACTGCATTTT ATTTCTGCAT 2640 

CCACATTCCA ATCATATTAG AACTAAGATA TTTATCTATG AAGATATAAA TGGTGCAGAG 2700 

AGACTTTCAT CTGTGGATTG CGTTGTTTCT CTAGGGTTCC TCAGCCACTG ATGCCTCGCC 2760 

ACAAGCCATG TGATATGTGA AATAAAAAGG GATTCTTCCT ATAGCCTAAA TGAAGTTCCC 2820 

TCTGGGGAGA GTTCTGGTAC TGCAATCACA ATGCCAGATG GTGTTTATGG GCTATTTGTG 2880 

TAAGTAAGTG GTAAGATGCT ATGAAGTAAG TGTGTTTGTT TTCATCTTAT GGAAACTCTT 2940 

GATGCATGTG CTTTTGTATG GAATAAATTT TGGTGCAATA TGATGTCATT CAACTTTGCA 3000 

TTGAATTGAA TTTTGGTTGT ATTTATATGT ATTATACCTG TCACGCTTCT AGTTGCTTCA 3 060 

AC C ATTTTAT AACCATTTTT GTACAT AT TT TACTTGAAAA TATTTTAAAT GGAAATTTAA 3120 
ATAAACATTT GATAGTTTAC ATAAAAAAAA AAAAAAAAAA A 

AAD2 DNA sequence 

Gene name: Thrombospondin- 1 

Unigene number: Hs. 87409 

Probeset Accession #: AA232645 

Nucleic Acid Accession #: NM_003246 

Coding sequence: 112-3624 (predicted start/stop codons underlined) 

GGACGCACAG GCATTCCCCG CGCCCCTCCA GCCCTCGCCG CCCTCGCCAC CGCTCCCGGC 60 

CGCCGCGCTC CGGTACACAC AGGATCCCTG CTGGGCACCA ACAGCTCCAC CATGGGGCTG 120 

GCCTGGGGAC TAGGCGTCCT GTTCCTGATG CATGTGTGTG GCACCAACCG CATTCCAGAG 180 

TCTGGCGGAG ACAACAGCGT GTTTGACATC TTTGAACTCA CCGGGGCCGC CCGCAAGGGG 240 

TCTGGGCGCC GACTGGTGAA GGGCCCCGAC CCTTCCAGCC CAGCTTTCCG CATCGAGGAT 3 00 

GCCAACCTGA TCCCCCCTGT GCCTGATGAC AAGTTCCAAG ACCTGGTGGA TGCTGTGCGG 3 60 

GCAGAAAAGG GTTTCCTCCT TCTGGCATCC CTGAGGCAGA TGAAGAAGAC CCGGGGCACG 4 20 

CTGCTGGCCC TGGAGCGGAA AGACCACTCT GGCCAGGTCT TCAGCGTGGT GTCCAATGGC 4 80 

AAGGCGGGCA CCCTGGACCT CAGCCTGACC GTCCAAGGAA AGCAGCACGT GGTGTCTGTG 54 0 

GAAGAAGCTC TCCTGGCAAC CGGCCAGTGG AAGAGCATCA CCCTGTTTGT GCAGGAAGAC 600 

AGGGCCCAGC TGTACATCGA CTGTGAAAAG ATGGAGAATG CTGAGTTGGA CGTCCCCATC _ 660 

CAAAGCGTCT TCACCAGAGA CCTGGCCAGC ATCGCCAGAC TCCGCATCGC AAAGGGGGGC 720 

GTCAATGACA ATTTCCAGGG GGTGCTGCAG AATGTGAGGT TTGTCTTTGG AACCACACCA 780 

GAAGACATCC TCAGGAACAA AGGCTGCTCC AGCTCTACCA GTGTCCTCCT CACCCTTGAC 840 

AACAACGTGG TGAATGGTTC CAGCCCTGCC ATC CGCACT A ACTACATTGG CCACAAGACA 900 

AAGGACTTGC AAGCCATCTG CGGCATCTCC TGTGATGAGC TGTCCAGCAT GGTCCTGGAA 960 

CTCAGGGGCC TGCGCACCAT TGTGACCACG CTGCAGGACA GCATCCGCAA AGTGACTGAA 1020 

GAGAACAAAG AGTTGGCCAA TGAGCTGAGG CGGCCTCCCC TATGCTATCA CAACGGAGTT 1080 

CAGTACAGAA ATAACGAGGA ATGGACTGTT GATAGCTGCA CTGAGTGTCA CTGTCAGAAC 1140 

TCAGTTACCA TCTGCAAAAA GGTGTCCTGC CCCATCATGC CCTGCTCCAA TGCCACAGTT 1200 

CCTGATGGAG AATGCTGTCC TCGCTGTTGG CCCAGCGACT CTGCGGACGA TGGCTGGTCT 1260 

CCATGGTCCG AGTGGACCTC CTGTTCTACG AGCTGTGGCA ATGGAATTCA GCAGCGCGGC 1320 

CGCTCCTGCG ATAGCCTCAA CAACCGATGT GAGGGCTCCT CGGTCCAGAC ACGGACCTGC 1380 

CACATTCAGG AGTGTGACAA AAGATTTAAA CAGGATGGTG GCTGGAGCCA CTGGTCCCCG 1440 

TGGTCATCTT GTTCTGTGAC ATGTGGTGAT GGTGTGATCA CAAGGAT CCG GCTCTGCAAC 1500 

TCTCCCAGCC CCCAGATGAA TGGGAAACCC TGTGAAGGCG AAGCGCGGGA GACCAAAGCC 1560 

TGCAAGAAAG ACGCCTGCCC CATCAATGGA GGCTGGGGTC CTTGGTCACC ATGGGACATC 1620 

TGTTCTGTCA CCTGTGGAGG AGGGGTACAG AAACGTAGTC GTCTCTGCAA CAACCCCGCA 1680 

CCCCAGTTTG GAGGCAAGGA CTGCGTTGGT GATGTAACAG AAAAC C AG AT CTGCAACAAG 1740 

CAGGACTGTC CAATTGATGG ATGCCTGTCC AATCCCTGCT TTGCCGGCGT GAAGTGTACT 18 00 

AGCTACCCTG ATGGCAGCTG GAAATGTGGT GCTTGTCCCC CTGGTTACAG TGGAAATGGC 1860 

ATCCAGTGCA CAGATGTTGA TGAGTGCAAA GAAGTGCCTG ATGCCTGCTT CAACCACAAT 192 0 

GGAGAGCACC GGTGTGAGAA CACGGACCCC GGCTACAACT GCCTGCCCTG CCCCCCACGC 1980 

TTCACCGGCT CACAGCCCTT CGGC CAGGGT GTC GAACATG CCACGGCCAA CAAACAGGTG 2040 

TGCAAGCCCC GTAACCCCTG CACGGATGGG AtCCACGACT GCAACAAGAA CGCCAAGTGC 2100 

AACTACCTGG GCCACTATAG CGACCCCATG TACCGCTGCG AGTGCAAGCC TGGCTACGCT 2160 

GGCAATGGCA TCATCTGCGG GGAGGACACA GAC CTGGATG GCTGGCCCAA TGAGAAC CTG 2220 

GTGTGCGTGG CCAATGCGAC TTACCACTGC AAAAAGGATA ATTGCCCCAA CCTTCCCAAC 2280 

TCAGGGCAGG AAGACTATGA CAAGGATGGA ATTGGTGATG CCTGTGATGA TGACGATGAC 234 0 

AATGATAAAA TT C CAGATG A CAGGGACAAC TGTCCATTCC ATTACAACCC AGCTCAGTAT 2400 

GACTATGACA GAGATGATGT GGGAGAC~GC TGTGACAACT GTCCCTACAA CCACAACCCA 2460 

GATCAGGCAG ACACAGACAA CAATGGG«; AA GGAGACGCCT GTGCTGCAGA CATTGATGGA 2520 

GACGGTATCC TCAATGAACG GGACAACTGC CAGTACGTCT ACAATGTGGA CCAGAGAGAC 258 0 

ACTGATATGG ATGGGGTTGG AGATCAGTGT GACAATTGCC CCTTGGAACA CAATCCGGAT 2640 

CAGCTGGACT CTG ACT C AG A CCGCATTGGA GATACCTGTG ACAACAATCA GGATATTGAT 2700 

GAAGATGGCC AC C AGAACAA TCTGGACAAC TGTCCCTATG TGCCCAATGC CAACCAGGCT 2760 

GACCATGACA AAGATGGCAA GGGAGATGCC TGTGACCACG ATGATGACAA CGATGGCATT 2820 

CCTGATGACA AGGACAACTG CAGACTCGTG CCCAATCCCG ACCAGAAGGA CTCTGACGGC 2880 

GATGGTCGAG GTGATGCCTG CAAAGATGAT TTTGACCATG ACAGTGTGCC AGACATCGAT 2 94 0 
GACATCTGTC CTGAGAATGT TGACATCAGT GAGACCGATT TCCGCCGATT CCAGATGATT 3 000 
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CCTCTGGACC CCAAAGGGAC ATCCCAAAAT GACCCTAACT GGGTTGTACG CCATCAGGGT 3060 

AAAGAACTCG TCCAGACTGT CAACTGTGAT CCTGGACTCG CTGTAGGTTA TGATGAGTTT 3120 

AATGCTGTGG ACTTCAGTGG CACCTTCTTC ATCAACACCG AAAGGGACGA TGACTATGCT 3180 

GGATTTGTCT TTGGCTACCA GTCCAGCAGC CGCTTTTATG TTGTGATGTG GAAGCAAGTC 3240 

ACCCAGTCCT ACTGGGACAC CAACCCCACG AGGGCTCAGG GATACTCGGG CCTTTCTGTG 3300 

AAAGTTGTAA ACTCCACCAC AGGGCCTGGC GAGCACCTGC GGAACGCCCT GTGGCACACA 3360 

GGAAACACCC CTGGCCAGGT GCGCACCCTG TGGCATGACC CTCGTCACAT AGGCTGGAAA 3420 

GATTTCACCG CCTACAGATG GCGTCTCAGC CACAGGCCAA AGACGGGTTT CATTAGAGTG 3480 

GTGATGTATG AAGGGAAGAA AATCATGGCT GACTCAGGAC CCATCTATGA TAAAACCTAT 3540 

GCTGGTGGTA GACTAGGGTT GTTTGTCTTC TCTCAAGAAA TGGTGTTCTT CTCTGACCTG 3600 

AAATACGAAT GTAGAGATCC CTAATCATCA AATTGTTGAT TGAAAGACTG ATCATAAACC 3660 

AATGCTGGTA TTGCACCTTC TGGAACTATG GGCTTGAGAA AACCCCCAGG ATCACTTCTC 3720 

CTTGGCTTCC TTCTTTTCTG TGCTTGCATC AGTGTGGACT CCTAGAACGT GCGACCTGCC 3 780 

TCAAGAAAAT GCAGTTTTCA AAAACAGACT CATCAGCATT CAGCCTCCAA TGAATAAGAC 3 840 

ATCTTCCAAG CATATAAACA ATTGCTTTGG TTTCCTTTTG AAAAAGCATC TACTTGCTTC 3900 

AGTTGGGAAG GTGCCCATTC CACTCTGCCT TTGTCACAGA GCAGGGTGCT ATTGTGAGGC 3960 

CATCTCTGAG CAGTGGACTC AAAAGCATTT TCAGGCATGT CAGAGAAGGG AGGACTCACT 4020 

AGAATTAGCA AACAAAACCA CCCTGACATC CTCCTTCAGG AACACGGGGA GCAGAGGCCA 4080 

AAGCACTAAG GGGAGGGCGC ATACCCGAGA CGATTGTATG AAGAAAATAT GGAGGAACTG 4140 

TTACATGTTC GGTACTAAGT CATTTTCAGG GGATTGAAAG ACTATTGCTG GATTTCATGA 4200 

TGCTGACTGG CGTTAGCTGA TTAACCCATG TAAATAGGCA CTTAAATAGA AGCAGGAAAG 4260 

GGAGACAAAG ACTGGCTTCT GGACTTCCTC CCTGATCCCC ACCCTTACTC ATCACCTTGC 4320 

AGTGGCCAGA ATTAGGGAAT CAGAATCAAA CCAGTGTAAG GCAGTGCTGG CTGCCATTGC 4380 

CTGGTCACAT TGAAATTGGT GGCTTCATTC TAGATGTAGC TTGTGCAGAT GTAGCAGGAA 4440 

AATAGGAAAA CCTACCATCT CAGTGAGCAC CAGCTGCCTC CCAAAGGAGG GGCAGCCGTG 4 50 0 

CTTATATTTT TATGGTTACA ATGGCACAAA ATTATTATCA ACCTAACTAA AACATTCCTT 4560 

TTCTCTTTTT TCCGTAATTA CTAGGTAGTT TTCTAATTCT CTCTTTTGGA AGTATGATTT 4620 

TTTTAAAGTC TTTACGATGT AAAATATTTA TTTTTTACTT ATTCTGGAAG ATCTGGCTGA 4680 

AGGATTATTC ATGGAACAGG AAGAAGCGTA AAGACTATCC ATGTCATCTT TGTTGAGAGT ,4 740 

CTTCGTGACT GTAAGATTGT AAATACAGAT TATTTATTAA CTCTGTTCTG CCTGGAAATT 4800 

TAGGCTTCAT ACGGAAAGTG TTTGAGAGCA AGTAGTTGAC ATTTATCAGC AAATCTCTTG 4860 

CAAGAACAGC ACAAGGAAAA TCAGTCTAAT AAGCTGCTCT GCCCCTTGTG CTCAGAGTGG 4920 

ATGTTATGGG ATTCCTTTTT TCTCTGTTTT ATCTTTTCAA GTGGAATTAG TTGGTTATCC 4980 

ATTTGCAAAT GTTTTAAATT GCAAAGAAAG CCATGAGGTC TTCAATACTG TTTTACCCCA 5040 

TCCCTTGTGC ATATTTCCAG GGAGAAGGAA AGCATATACA CTTTTTTCTT TCATTTTTCC 5100 

AAAAGAGAAA AAAATGACAA AAGGTGAAAC TTACATACAA ATATTACCTC ATTTGTTGTG 5160 

TGACTGAGTA AAGAATTTTT GGATCAAGCG GAAAGAGTTT AAGTGTCTAA CAAACTTAAA 5220 

GCTACTGTAG TACCTAAAAA GTCAGTGTTG TACATAGCAT AAAAACTCTG CAGAGAAGTA 5280 

TTCCCAATAA GGAAATAGCA TTGAAATGTT AAATACAATT TCTGAAAGTT ATGTTTTTTT 534 0 

TCTATCATCT GGTATACCAT TGCTTTATTT TTATAAATTA TTTTCTCATT GCCATTGGAA 5400 

TAGAATATTC AGATTGTGTA GATATGCTAT TTAAATAATT TAT CAGG AAA TACTGCCTGT 5460 

AGAGTTAGTA TTTCTATTTT TATATAATGT TTGCACACTG AATTGAAGAA TTGTTGGTTT 5520 

TTTCTTTTTT TTGTTTTTTT TTTTTTTTTT TTTTTTTTTG CTTTTGACCT CCCATTTTTA 5580 

CTATTTGCCA ATACCTTTTT CTAGGAATGT GCTTTTTTTT GTACACATTT TTATCCATTT 564 0 

TACATTCTAA AGCAGTGTAA GTTGTATATT ACTGTTTCTT ATGTACAAGG AACAACAATA 5700 
AATCATATGG AAATTTATAT TT 



AAD9 DNA sequence 

Gene name: LIM homeobox protein cof actor (CLIM-1) 
Unigene number: Hs.4980 
Probeset Accession #: F13782 
Nucleic Acid Accession #: AF047337 

Coding sequence: 110-1231 (predicted start/stop codons underlined) 

GTGAGCGTGT GTGCGTGCGT CTACTTTGTA CTGGGAAGAA CAC AGC C C AT_GTGCT CTGC A 60 

TGGACGTTAC TGATACTCTG TTTAGCTTGA TTTTCGAAAA GCAGGCAAGA TGTCCAGCAC 120 

AC C AC ATGAC CCCTTCTATT CTTCTCCTTT CGGCCCATTT TATAGGAGGC ATACACCATA 180 

CATGGTACAG CCAGAGTACC GAATCTATGA GATGAACAAG AGACTGC^ IT CT CGCACAG A 24 0 

GGATAGTGAC AACCTCTGGT GGGACGCCTT TGCCACTGAA TTTTTTG' \G ATGACGCCAC 3 00 

ATTAACCCTT TCATTTTGTT TGGAAGATGG AC CAAAGCG A T AC ACT AT CG G CAGGACC CT 3 60 

CATCCCCCGT TACTTTAGCA CTGTGTTTGA AGGAGGGGTG ACCGACCTGT ATTACATTCT 420 

C AAACACT CG AAAGAGTCAT ACCACAACTC ATCCATCACG GTGGACTGCG ACCAGTGTAC 480 

CATGGTCACC CAGCACGGGA AGCCCATGTT TACCAAGGTA TGTACAGAAG GCAGACTGAT 540 

CTTGGAGTTC ACCTTTGATG ATCTCATGAG AATCAAAACA TGGCACTTTA CCATTAGACA 600 

ATACCGAGAG TTAGTCCCGA GAAGCATCCT AGCCATGCAT GCACAAGATC CTCAGGTCCT 660 

GGATCAGCTG TCCAAAAACA TCACCAGGAT GGGGCTAACA AACTTCACCC TCAACTACCT 720 

CAGGTTGTGT GTAATATTGG AGCCAATGCA GGAACTGATG T CG AG AC AT A AAACTTACAA 780 
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CCTCAGTCCC CGAGACTGCC TGAAGACCTG CTTGTTTCAG AAGTGGCAGA GGATGGTGGC 840 

TCCGCCAGCA GAACCCACAA GGCAACCAAC AACCAAACGG AGAAAAAGGA AAAATTC CAC 900 

CAGCAGCACT TCCAACAGCA GCGCTGGGAA CAATGCAAAC AGCACTGGCA GCAAGAAGAA 960 

GACCACAGCT GCAAACCTGA GTCTGTCCAG TCAGGTACCT GATGTGATGG TGGTAGGAGA 1020 

GCCAACTCTG ATGGGAGGTG AGTTTGGGGA CGAGGACGAA AGGCTAATCA CTAGATTAGA 1080 

AAACACGCAA TATGATGCGG CCAACGGCAT GGACGACGAG GAGGACTTCA ACAATTCACC 1140 

CGCGCTGGGG AACAACAGCC CGTGGAACAG TAAACCTCCC GCCACTCAAG AGACCAAATC 1200 

AGAAAACCCC CCACCCCAGG CTTC CCAATA_AGATGAT CGG CACCAGAATC CACTGTCAAT 1260 

AGGCCCGTGG GTGATCATTA CAATTGCAAA TCTTTACTTA CAGGAGAGGA AACAGAAGAG 132 0 

ATAAAAACTT TTCCATGCAA ATATCTATTT CTAAACCACA ATGATCTGAT TTTCTTTCTT 1380 

CTTTCTTTTT TTCTAATTGA GAGGATTATT CCCAGTAAGC TTCCATGACC CTTTCTTGGA 144 0 

GGCCTTCACA GGTAATACAG ATACTGGCAC TGATTGTAAT TAAAATGAGA GAAAACT CTA 1500 

GCGCATCTTC TGGCACGGTT TTAACAACGT GTTTGTGTTG AATTTCCTTT TTATGCATCA 1560 

AACGAAGGCC ATATTGTCCA TAAATGCTCA GTGCTCAGGA TCTCATTAAT ATGCCGAACC 1620 

TAACTACAGA TGACTTTTTA ATATTGTAAA ATATTTTCTG CTTTTTGACT TGCATCTGAG 1680 

AGTTTCTTGT TTCAGTAAAA AAAGAAAAGA CAAAAAAATC AGCTTTGGAA AGTAATTTAA 174 0 

ATGTACCTTA TTTTTTTTTT CTTTATGTTT TCTTTCATTG GGCAACAGCT AAGAGGGCCC 1800 

AGCAAGGTAA TTTATGGTTG AGCTGATGTC AATTGGTTCT TGTCTTGAGT CGACT CAATT 1860 

TAGCCCAAGT GCTGAAACAA GAAATGTCAT TTTTTTCATC AAAGACACCA GGGCAGATTT 1920 

TTAAGTAAAG AAAGACAATT GGACCCTTAA GAATTTATGC ATTTGTAAAG TTGCTGTTGA 198 0 

TCCAAATATT TTCAAGCCAT GTAATCCATT GGTTTTGTGG GCAGTTTAAT AAACCTGAAC 2040 

CTTTGTGTGT TTT CTAATTG TAC CTGAGTT GACCATCCTT TCTTTTTATA GTATATTTCT 2100 

TGTATGATAT TTTGTAAAGC TCTCACCTGG TTCTTTTATG GGGACTTTTC GTTTTTGGGC 2160 

AACTCCAGTG TATTTATGTG AAACTT TAT A AGAGAATTAA TTTTTCCATT TGCATATTAA 2220 

TATGTTCCTC CACACATGTA AAGGCACAGT GGCTCCGTGT GTTAAAAAAC AGCTGTATTT 2280 
TATGTATGCT TTACTGATAA GTGTGCCAAT AATAAACTGT GTTAATGACC 



AAE1 DNA sequence 

Gene name: guanine nucleotide binding protein 11 

Unigene number: Hs.83381 

Probeset Accession #: U313 84 

Nucleic Acid Accession #: NM_004126.1 

Coding sequence: 108-32 9 (predicted start/ stop codons underlined) 

GGCACGAGCT CGTGCCGGCC TTCAGTTGTT TCGGGACGCG CCGAGCTTCG CCGCTCTTCC 60 

AGCGGCTCCG CTGCCAGAGC TAGC CCGAGC CCGGTTCTGG GGCGAAAATG CCTGCCCTTC 120 

ACATCGAAGA TTTG CCAGAG AAGGAAAAAC TGAAAATGGA AGTTGAGCAG CTTCGCAAAG 180 

AAGTGAAGTT GCAGAGACAA CAAGTGTCTA AATGTT CTGA AGAAATAAAG AACTATATTG 240 

AAGAACGTTC TGGAGAGGAT CCTCTAGTAA AGGGAATTCC AGAAGACAAG AACCCCTTTA 3 00 

AAGAAAAAGG CAGCTGTGTT ATT T CAT AAA TAACTTGGGA GAAACTGCAT CCTAAGTGGA 360 

AGAACTAGTT TGTTTTAGTT TTCCCAGATA AAACCAACAT GCTTTTTAAG GAAGGAAGAA 420 

TGAAATTAAA AGGAGACTTT CTTAAGCACC ATATAGATAG GGTTATGTAT AAAAGCATAT 4 80 

GTGCTACTCA TCTTTGCTCA CTATGCAGTC TTTTTTAAGA GAGCAGAGAG TATCAGATGT 540 

ACAATTATGG AAATAAGAAC ATTACTTGAG CATGACACTT CTTTCAGTAT ATTGCTTGAT 600 
GCTTCAAATA AAGTTTTGTC TT 



AAB2 DNA sequence . . 

Gene name: Transcription factor 4 (immunoglobulin transcription factor 2) (ITF-2J 

(SL3-3 Enhancer factor 2) (SEF-2) 
Unigene number: Hs . 28 90 6 8 
Probeset Accession #: M74719 
Nucleic Acid Accession #: NMJ)03199.1 

Coding sequence: 200-2203 (predicted start/stop codons underlined) 

CGGGGGGATC TTGGCTGTGT GTCTGCGGAT CTGTAGTGGC GGCGGCGGCG GCGGCGGCGG 60 

GGAGGCAGCA GGCGCGGGAG CGGGCGCAGG AGCAGGCGGC GGCGGTGGCG GCGGCGGTTA 120 

GACATGAACG CCGCCTCGGC GCCGGCGGTG CACGGAGAGC CCCTTCTCGC GCGCGGGCGG 180 , 

TTTGTGTGAT T TTG CT AAAA_TG C ATC AC CA ACAGCGAATG GCTGCCTTAG GGACGGACAA 240 

AGAGCTGAGT GATTTACTGG ATTTCAGTGC GATGTTTTCA CCTCCTGTGA GCAGTGGGAA 30 0 

AAATGGAC C A ACTTCTTTGG CAAGTGGACA TTTTACTGGC T C AAATGT AG AAGACAGAAG 36 0 

TAGCTCAGGG TCCTGGGGGA ATGGAGGACA TCCAAGCCCG TCCAGGAACT ATGGAGATGG 420 

GACTCCCTAT GACCACATGA CCAGCAGGGA CCTTGGGTCA CATGACAATC TCTCTCCACC 4 80 

TTTTGTCAAT TCCAGAATAC AAAGTAAAAC AG AAAGGGG C T CAT ACT CAT CTTATGGGAG 54 0 

AGAATCAAAC TTACAGGGTT GCCACCAGCA GAGTCTCCTT GGAGGTGACA TGGATATGGG 60 0 

CAACCCAGGA ACCCTTTCGC CCACCAAACC TGGTTCCCAG TACTATCAGT ATTCTAGCAA 660 

TAATCCCCGA AGGAGGCCTC TTCACAGTAG TGCCATGGAG GTACAGACAA AGAAAGT TCG 720 
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AAAAGTT CCT CCAGGTTTGC CATCTTCAGT CTATGCTCCA TCAGCAAGCA CTGCCGACTA 780 

CAATAGGGAC TCGCCAGGCT ATCCTTCCTC CAAACCAGCA ACCAGCACTT TCCCTAGCTC 840 

CTTCTTCATG CAAGATGGCC ATCACAGCAG TGACCCTTGG AGCTCCTCCA GTGGGATGAA 900 

TCAGCCTGGC TATGCAGGAA TGTTGGGCAA CTCTTCTCAT ATTCCACAGT CCAGCAGCTA 960 

CTGTAGCCTG CATCCACATG AACGTTTGAG CTATCCATCA CACTCCTCAG CAGACATCAA 1020 

TTCCAGTCTT CCTCCGATGT CCACTTTCCA TCGTAGTGGT ACAAACCATT ACAGCACCTC 1080 

TTCCTGTACG CCTCCTGCCA ACGGGACAGA CAGTATAATG GCAAATAGAG GAAGCGGGGC 1140 

AGCCGGCAGC TCCCAGACTG GAGATGCTCT GGGGAAAGCA CTTGCTTCGA TCTATTCTCC 1200 

AGATCACACT AACAACAGCT TTTCATCAAA CCCTTCAACT CCTGTTGGCT CTCCTCCATC 1260 

TCTCTCAGCA GGCACAGCTG TTTGGTCTAG AAATGGAGGA CAGGCCTCAT CGTCTCCTAA 1320 

TTATGAAGGA CCCTTACACT CTTTGCAAAG CCGAATTGAA GATCGTTTAG AAAGACTGGA 1380 

TGATGCTATT CATGTTCTCC GGAACCATGC AGTGGGCCCA TCCACAGCTA TGCCTGGTGG 144 0 

TCATGGGGAC ATGCATGGAA TCATTGGACC TTCTCATAAT GGAGCCATGG GTGGTCTGGG 1500 

CTCAGGGTAT GGAACCGGCC TTCTTTCAGC CAACAGACAT TCACTCATGG TGGGGACCCA 1560 

TCGTGAAGAT GGCGTGGCCC TGAGAGGCAG CCATTCTCTT CTGCCAAACC AGGTTCCGGT 1620 

TCCACAGCTT CCTGTCCAGT CTGCGACTTC CCCTGACCTG AACCCACCCC AGGACCCTTA 1680 

CAGAGGCATG CCACCAGGAC TACAGGGGCA GAGTGTCTCC TCTGGCAGCT CTGAGATCAA 1740 

ATCCGATGAC GAGGGTGATG AGAACCTGCA AGACACGAAA TCTTCGGAGG ACAAGAAATT 1800 

AGATGACGAC AAGAAGGATA TCAAATCAAT TACTAGCAAT AATGACGATG AGGACCTGAC 1860 

ACCAGAGCAG AAGGCAGAGC GTGAGAAGGA GCGGAGGATG GCCAACAATG CCCGAGAGCG 1920 

TCTGCGGGTC CGTGACATCA ACGAGGCTTT CAAAGAGCTC GGCCGCATGG TGCAGCTCCA 1980 

CCTCAAGAGT GACAAGCCCC AGACCAAGCT CCTGATCCTC CACCAGGCGG TGGCCGTCAT 2040 

CCTCAGTCTG GAGCAGCAAG TCCGAGAAAG GAATCTGAAT CCGAAAGCTG CGTGTCTGAA 2100 

AAGAAGGGAG GAAGAGAAGG TGTCCTCGGA GCCTCCCCCT CTCTCCTTGG CCGGCCCACA 2160 

CCCTGGAATG GGAGACGCAT CGAATCACAT GGGACAGATG TAAAAGGGTC CAAGTTGCCA 2220 

CATTGCTTCA TTAAAACAAG AGACCACTTC CTTAACAGCT GTATTATCTT AAACCCACAT 2280 

AAACACTTCT CCTTAACCCC CATTTTTGTA ATATAAGACA AGTCTGAGTA GTTATGAATC 234 0 

GCAGACGCAA GAGGTTTCAG CATTCCCAAT TATCAAAAAA CAGAAAAACA AAAAAAAGAA 2400 

AGAAAAAAGT GCAACTTGAG GGACGACTTT CTTTAACATA TCATTCAGAA TGTGCAAAGC 2460 
AGTATGTACA GGCTGAGACA CAGCCCAGAG ACTGAACGGC 



AAE4 DNA sequence 

Gene name: phosphatidylcholine 2 -acyl hydrolase 
Unigene number: Hs. 2 115 8 7 
Probeset Accession #: M68874 

Nucleic Acid Accession #: M68874 t 
Coding sequence: 139-2388 (predicted start/stop codons underlined) 

GAATTCTCCG GAGCTGAAAA AGGATCCTGA CTGAAAGCTA GAGGCATTGA GGAGCCTGAA 60 

GATTCTCAGG TTTTAAAGAC GCTAGAGTGC CAAAGAAGAC TTTGAAGTGT GAAAACATTT 120 

CCTGTAATTG AAACCAAAAT__GTCATTTATA GATCCTTACC AGCACATTAT AGTGGAGCAC 180 

CAGTATTCCC ACAAGTTTAC GGTAGTGGTG TTACGTGCCA CCAAAGTGAC AAAGGGGGCC 240 

TTTGGTGACA TGCTTGATAC TCCAGATCCC TATGTGGAAC TTTTTATCTC TACAACCCCT 300 

GACAGCAGGA AGAGAACAAG ACATTTCAAT AATGACATAA ACCCTGTGTG GAATGAGACC 360 

TTTGAATTTA TTTTGGATCC TAATCAGGAA AATGTTTTGG AGATTACGTT AATGGATGCC 420 

AATTATGTCA TGGATGAAAC TCTAGGGACA GCAACATTTA CTGTATCTTC TATGAAGGTG 480 

GGAGAAAAGA AAGAAGTTCC TTTTATTTTC AAC C AAGTC A CTGAAATGGT TCTAGAAATG 540 

TCTCTTGAAG TTTGCTCATG CCCAGACCTA CGATTTAGTA TGGCTCTGTG TGAT CAGGAG 600 

AAGACTTTCA GACAACAGAG AAAAGAACAC ATAAGGGAGA GCATGAAGAA ACTCTTGGGT 66 0 

CCAAAGAATA GTGAAGGATT GCATTCTGCA CGTGATGTGC CTGTGGTAGC CATATTGGGT 720 

TCAGGTGGGG GTTTCCGAGC CATGGTGGGA TTCTCTGGTG TGATGAAGGC ATTATACGAA 780 

TCAGGAATTC TGGATTGTGC TACCTACGTT GCTGGTCTTT CTGGCTCCAC CTGGTATATG 84 0 

TCAACCTTGT ATTCTCACCC TGATTTTCCA GAGAAAGGGC CAGAGGAGAT TAATGAAGAA 900 

CTAATGAAAA ATGTTAGCCA CAATCCCCTT TTAC TTCTCA CACCACAGAA AGTTAAAAGA 960 

TATGTTGAGT CTTTATGGAA GAAGAAAAGC TCTGGACAAC CTGTCACCTT TACTGAC AT C 1020 

TTTGGGATGT TAATAGGAGA AACACTAATT CATAATAGAA TGAATACTAC TCTGAGCAGT 108 0 

TTGAAGGAAA AAGTTAATAC TGCACAATGC CCTTTACCTC TTTTCACCTG TCTTCATGTC 1140 

AAACCTGACG TTTCAGAGCT GATGTTTGCA GATTGGGTTG AATTTAGTCC ATACGAAATT 1200 

GGCATGGCTA AAt/.XSQTAC TTTTATGGCT CCCGACTTAT TTGGAAGCAA ATTTTTTATG 1260 

GGAACAGTCG TTAAGAAGTA TGAAGAAAAC CCCTTGCATT TCTTAATGGG TGTCTGGGGC 1320 

AGTGCCTTTT CCATATTGTT CAACAGAGTT TTGGGCGTTT CTGGTTCACA AAGCAGAGGC 1380 

TCCACAATGG AGGAAGAATT AGAAAATATT ACCACAAAGC ATATTGTGAG TAATGATAGC 1440 

TCGGACAGTG ATGATGAATC ACACGAACCC AAAGGCACTG AAAATGAAGA TGCTGGAAGT 1500 

GACTATCAAA GTGATAATCA AGCAAGTTGG ATT CAT CGT A TGATAATGGC CTTGGTGAGT 1560 

GATTCAGCTT TATTCAATAC CAGAGAAGGA CGTGCTGGGA AGGTACACAA CTTCATGCTG 1620 

GGCTTGAATC TCAAT AC AT C TTATCCACTG TCTCCTTTGA GTGACTTTGC CACACAGGAC 1680 

TCCTTTGATG ATGATGAACT GGATGCAGCT GTAGC AG AT C CTGATGAATT TGAGCGAATA 174 0 
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TATGAGCCTC TGGATGTCAA AAGTAAAAAG ATTCATGTAG TGGACAGTGG GCTCACATTT 1800 

AACCTGCCGT ATCCCTTGAT ACTGAGACCT CAGAGAGGGG TTGATCTCAT AATCTCCTTT 1860 

GACTTTTCTG CAAGGCCAAG TGACTCTAGT CCTCCGTTCA AGGAACTTCT ACTTGCAGAA 1920 

AAGTGGGCTA AAATGAACAA GCTCCCCTTT CCAAAGATTG ATCCTTATGT GTTTGATCGG 1980 

GAAGGGCTGA AGGAGTGCTA TGTCTTTAAA CCCAAGAATC CTGATATGGA GAAAGATTGC 2040 

CCAACCATCA TCCACTTTGT TCTGGCCAAC ATCAACTTCA GAAAGTACAA GGCTCCAGGT 2100 

GTTCCAAGGG AAACTGAGGA AGAGAAAGAA ATCGCTGACT TTGATATTTT TGATGACCCA 2160 

GAATCACCAT TTTCAACCTT CAATTTTCAA TATCCAAATC AAGCATTCAA AAGACTACAT 2220 

GATCTTATGC ACTTCAATAC TCTGAACAAC ATTGATGTGA TAAAAGAAGC CATGGTTGAA 2280 

AGCATTGAAT ATAGAAGACA GAATCCATCT CGTTGCTCTG TTTCCCTTAG TAATGTTGAG 2340 

GCAAGAAGAT TTTTCAACAA GGAGTT TCTA AGTAAACCCA AAGCATAGTT CATGTACTGG 2400 

AAATGGCAGC AGTTTCTGAT GCTGAGGCAG TTTGCAATCC CATGACAACT GGATTTAAAA 2460 

GTACAGTACA GATAGTCGTA CTGAT CATGA GAGACTGGCT GATACTCAAA GTTGCAGTTA 2520 

CTTAGCTGCA TGAGAATAAT ACTATTATAA GTTAGGTGAC AAATGATGTT GATTATGTAA 2580 

GGATATACTT AGCTACATTT TCAGTCAGTA TGAACTTCCT GATACAAATG TAGGGATATA 2640 

TACTGTATTT TTAAACATTT CTCACCAACT TTCTTATGTG TGTTCTTTTT AAAAATTTTT 2700 

TTTCTTTTAA AATATTTAAC AGTTCAATCT CAATAAGACC TCGCATTATG TATGAATGTT 2760 

ATTCACTGAC TAGATTTATT CATACCATGA GACAACACTA TTTTTATTTA TATATGCATA 2820 
TATATACATA CATGAAATAA ATACATCAAT ATAAAAATAA AAAAAAACGG AATTC 



ACA1 DMA sequence 

Gene name: tissue factor pathway inhibitor 2 TFPI2 , placental protein 5 { 
Unigene number: Hs. 7 804 5.. 
Probeset Accession #: D29992 
Nucleic Acid Accession #: D29992.1 

Coding sequence: 57-764 (predicted start/stop codons underlined) 

GCCGCCAGCG GCTTTCTCGG ACGCCTTGCC CAGCGGGCCG CCCGACCCCC TGCACCATGG , 60 

ACCCCGCTCG CCCCCTGGGG CTGTCGATTC TGCTGCTTTT CCTGACGGAG GCTGCACTGG 120 

GCGATGCTGC TCAGGAGCCA ACAGGAAATA ACGCGGAGAT CTGTCTCCTG CCCCTAGACT 180 

ACGGACCCTG CCGGGCCCTA CTTCTCCGTT ACTACTACGA CAGGTACACG CAGAGCTGCC 240 

GCCAGTTCCT GTACGGGGGC TGCGAGGGCA ACGC CAACAA TTTCTACACC TGGGAGGCTT 3 00 

GCGACGATGC TTGCTGGAGG ATAGAAAAAG TTCCCAAAGT TTGCCGGCTG CAAGTGAGTG 3 60 

TGGACGACCA GTGTGAGGGG TCCACAGAAA AGTATTTCTT TAATCTAAGT TCCATGACAT 1 420 

GTGAAAAATT CTTTTCCGGT GGGTGTCACC GGAACCGGAT TGAGAACAGG TTTCCAGATG 480 

AAGCTACTTG TATGGGCTTC TGCGCACCAA AGAAAATTCC ATCATTTTGC TACAGT CCAA 54 0 

AAGATGAGGG ACTGTGCTCT GCCAATGTGA CTCGCTATTA TTTTAATCCA AGATACAGAA 600 

CCTGTGATGC TTTCACCTAT ACTGGCTGTG GAGGGAATGA CAATAACTTT GTTAGCAGGG 660 

AGGATTGCAA ACGTGCATGT GCAAAAGCTT TGAAAAAGAA AAAGAAGATG CCAAAGCTTC 720 

GCTTTGCCAG TAGAATCCGG AAAATTCGGA AGAAGCAATT TTAAACATTC TTAATATGTC 78 0 

ATCTTGTTTG TCTTTATGGC TTATTTGCCT TTATGGTTGT ATCTGAAGAA TAATATGACA 840 

GCATGAGGAA ACAAATCATT GGTGATTTAT TCACCAGTTT TTATTAATAC AAGTCACTTT 900 

TTCAAAAATT TGGATTTTTT TATATATAAC TAGCTGCTAT TCAAATGTGA GTCTACCATT 960 

TTTAATTTAT GGTTCAACTG TTTGTGAGAC GAATTCTTGC AATGCATAAG ATATAAAAGC 1020 

AAATATGACT CACT CATTT C TTGGGGTCGT ATTCCTGATT TCAGAAGAGG AT CAT AACTG 1080 

AAACAACATA AGACAATATA ATCATGTGCT TTTAACATAT TTGAGAATAA AAAGG AC TAG 1140 
CC 



ACB8 DNA sequence 
Gene name: myosin X 
Unigene number: Hs.6163 8 
Probeset Accession #: N77151 
Nucleic Acid Accession #: NM_012334 

Coding sequence: 223-6399 (predicted start/stop codons underlined) 

GAGACAAAGG CTGCCGTCGG GACGGGCGAG TTAGGGACTT GGGTTTGGGC GAACAAAAGG 60 

TGAGAAGGAC AAGAAGGGAC CGGGCGATGG CAGC7GGGGA GCCCCGCGGG CGCGCGTCCT 120 

CGGGAGTGGC GCCGTGACAC GCATGGTTTC CCCTaCCOG CGGCGGCGCT GACTTCCGCG 180 

AGTCGGAGCG GCACTCGGCG AGTCCGGGAC TGCGCTGGAA CAATGGATAA CTTCTTCACC 240 

GAGGGAACAC GGGTCTGGCT GAGAGAAAAT GGCCAGCATT TTCCAAGTAC TGTAAATTCC 300 

TGTGCAGAAG GCATCGTCGT CTTCCGGACA GACTATGGTC AGGTATTCAC TTACAAGCAG 36 0 

AGCACAATTA CCCACCAGAA GGTGACTGCT ATGCACCCCA CGAACGAGGA GGGCGTGGAT 420 

GACATGGCGT CCTTGACAGA GCTCCATGGC GGCTCCATCA TGTATAACTT ATTCCAGCGG 48 0 

TATAAGAGAA ATCAAATATA TACCTACATC GGCTCCATCC TGGCCTCCGT GAACCCCTAC 54 0 

CAGCCCATCG CCGGGCTGTA CGAGCCTGCC ACCATGGAGC AGTACAGCCG GCGCCACCTG 600 

GGCGAGCTGC CCCCGCACAT CTTCGCCATC GCCAACGAGT GCTACCGCTG CCTGTGGAAG 660 
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CGCTACGACA ACCAGTGCAT CCTCATCAGT GGTGAAAGTG GGGCAGGTAA AACCGAAAGC 720 

ACTAAATTGA TCCTCAAGTT TCTGTCAGTC ATCAGTCAAC AGTCTTTGGA ATTGTCCTTA 780 

AAGGAGAAGA CATCCTGTGT TGAACGAGCT ATTCTTGAAA GCAGCCCCAT CATGGAAGCT 840 

TTCGGCAATG CGAAGACCGT GTACAACAAC AACTCTAGTC GCTTTGGGAA GTTTGTTCAG 900 

5 CTGAACATCT GTCAGAAAGG AAATATTCAG GGCGGGAGAA TTGTAGATTA TTTATTAGAA 960 

AAAAACCGAG TAGTAAGGCA AAATCCCGGG GAAAGGAATT ATCACATATT TTATGCACTG 1020 

CTGGCAGGGC TGGAACATGA AGAAAGAGAA GAATTTTATT TATCTACGCC AGAAAACTAC 1080 

CACTACTTGA ATCAGTCTGG ATGTGTAGAA GACAAGACAA TCAGTGACCA GGAATCCTTT 1140 

AGGGAAGTTA TTACGGCAAT GGACGTGATG CAGTTCAGCA AGGAGGAAGT TCGGGAAGTG 1200 

10 TCGAGGCTGC TTGCTGGTAT ACTGCATCTT GGGAACATAG AATTTATCAC TGCTGGTGGG 1260 

GCACAGGTTT CCTTCAAAAC AGCTTTGGGC AGATCTGCGG AGTTACTTGG GCTGGACCCA 1320 

ACACAGCTCA CAGATGCTTT GACCCAGAGA TCAATGTTCC TCAGGGGAGA AGAGATCCTC 1380 

ACGCCTCTCA ATGTTCAACA GGCAGTAGAC AGCAGGGACT CCCTGGCCAT GGCTCTGTAT 1440 

GCGTGCTGCT TTGAGTGGGT AATCAAGAAG ATCAACAGCA GGATCAAAGG CAATGAGGAC 1500 

15 TTCAAGTCTA TTGGCATCCT CGACATCTTT GGATTTGAAA AC TTTGAGGT TAATCACTTT 1560 

GAACAGTTCA ATATAAACTA TGCAAACGAG AAACTTCAGG AGTACTTCAA CAAGCATATT 1620 

TTTTCTTTAG AACAACTAGA ATATAGCCGG GAAGGATTAGf TGTGGGAAGA TATTGACTGG 1680 

ATAGACAATG GAGAATGCCT GGACTTGATT GAGAAGAAAC TTGGCCTCCT AGCCCTTATC 1740 

AATGAAGAAA GCCATTTTCC TCAAGCCACA GACAGCACCT TATTGGAGAA GCTACACAGT 1800 

2 0 CAGCATGCGA ATAACCACTT TTATGTGAAG CCCAGAGTTG CAGTTAACAA TTTTGGAGTG 1860 

AAGCACTATG CTGGAGAGGT GCAATATGAT GTCCGAGGTA TCTTGGAGAA GAACAGAGAT 1920 

ACATTTCGAG ATGACCTTCT CAATTTGCTA AGAGAAAGCC GATTTGACTT TATCTACGAT 1980 

CTTTTTGAAC ATGTTTCAAG CCGCAACAAC CAGGATACCT TGAAATGTGG AAGCAAACAT 2040 

CGGCGGCCTA CAGTCAGCTC ACAGTTCAAG GACTCACTGC ATTCCTTAAT GGCAACGCTA 2100 

p25 AGCTCCTCTA ATCCTTTCTT TGTTCGCTGT ATCAAGCCAA ACATGCAGAA GATGCCAGAC 2160 

CAGTTTGACC AGGCGGTTGT GCTGAACCAG CTGCGGTACT CAGGGATGCT GGAGACTGTG 2220 

yf AGAATCCGCA AAGCTGGGTA TGCGGTCCGA AGACCCTTTC AGGACTTTTA CAAAAGGTAT 2280 

U} AAAGTGCTGA TGAGGAATCT GGCTCTGCCT GAGGACGTCC GAGGGAAGTG CACGAGCCTG 2340 

O CTGCAGCTCT ATGATGCCTC CAACAGCGAG TGGCAGCTGG GGAAGACCAA GGTCTTTCTT 2400 

730 CGAGAATCCT TGGAACAGAA ACTGGAGAAG CGGAGGGAAG AGGAAGTGAG CCACGCGGCC 2460 

Jj\ ATGGTGATTC GGGCCCATGT CTTGGGCTTC TTAGCACGAA AACAATACAG AAAGGTCCTT 2520 

TATTGTGTGG TGATAATACA GAAGAATTAC AGAGCATTCC TTCTGAGGAG GAGATTTTTG 2580 

CACCTGAAAA AGGCAGCCAT AGTTTTCCAG AAGCAACTCA GAGGTCAGAT TGCTCGGAGA 2640 

O GTTTACAGAC AATTGCTGGC AGAGAAAAGG GAGCAAGAAG AAAAGAAGAA ACAGGAAGAG 2700 

ffUS GAAGAAAAGA AGAAACGGGA GGAAGAAGAA AGAGAAAGAG AGAGAGAGCG AAGAGAAGCC 2760 

GAGCTCCGCG CCCAGCAGGA AGAAGAAACG AGGAAGCAGC AAGAACTCGA AGCCTTGCAG 2820 

AAGAGCCAGA AGGAAGCTGA ACTGACCCGT GAACTGGAGA AACAGAAGGA AAATAAGCAG 2880 

GTGGAAGAGA TCCTCCGTCT GGAGAAAGAA ATCGAGGACC TGCAGCGCAT GAAGGAGCAG 2940 

CAGGAGCTGT CGCTGACCGA GGCTTCCCTG CAGAAGCTGC AGGAGCGGCG GGACCAGGAG 3 000 

40 CTCCGCAGGC TGGAGGAGGA AGCGTGCAGG GCGGCCCAGG AGTTCCTCGA GTCCCTCAAT 3 060 

TTCGACGAGA TCGACGAGTG TGTCCGGAAT ATCGAGCGGT CCCTGTCGGT GGGAAGCGAA 3120 

TTTTCCAGCG AGCTGGCTGA GAGCGCATGC GAGGAGAAGC CCAACTTCAA CTTCAGCCAG 3180 

CCCTACCCAG AGGAGGAGGT CGATGAGGGC TTCGAAGCCG ACGACGACGC CTTCAAGGAC 3240 

TCCCCCAACC CCAGCGAGCA CGGCCACTCA GACCAGCGAA CAAGTGGCAT CCGGACCAGC 3300 

45 GATGACTCTT CAGAGGAGGA CCCATACATG AACGACACGG TGGTGC CCAC CAGCCCCAGT 3360 

GCGGACAGCA CGGTGCTGCT CGCCCCATCA GTGCAGGACT CCGGGAGCCT ACACAACTCC 3420 

TCCAGCGGCG AGTCCACCTA CTGCATGCCC CAGAACGCTG GGGACTTGCC CTCCCCAGAC 34 80 

GGCGACTACG ACTACGACCA GGATGACTAT GAGGACGGTG CCATCACTTC CGGCAGCAGC 3 540 

GTGACCTTCT CCAACTCCTA CGGCAGCCAG TGGTCCCCCG ACTACCGCTG CTCTGTGGGG 3 600 

50 ACCTACAACA GCTCGGGTGC CTACCGGTTC AGCTCTGAGG GGGCGCAGTC CTCGTTTGAA 3660 

GATAGTGAAG AGGACTTTGA TTCCAGGTTT GATACAGATG ATGAGCTTTC ATACCGGCGT 3 720 

GACTCTGTGT ACAGCTGTGT CACTCTGCCG TATTTCCACA GCTTTCTGTA CATGAAAGGT 3 780 

GGC CTGATGA ACTCTTGGAA ACGCCGCTGG TGCGTCCTCA AGGATGAAAC CTTC TTGTGG 3 840 

TTCCGCTCCA AGCAGGAGGC CCTCAAGCAA GGCTGGCTCC ACAAAAAAGG GGGGGGCTCC 3 900 

55 TCCACGCTGT C CAGGAGAAA TTGGAAGAAG CGCTGGTTTG TCCTCCGCCA GTCCAAGCTG 3 960 

ATGTACTTTG AAAACGACAG CGAGGAGAAG CTCAAGGGCA CCGTAGAAGT GCGAACGGCA 4020 

AAAGAGATCA TAGATAACAC CACCAAGGAG AATGGGATCG ACATCATTAT GGCCGATAGG 4080 

ACTTTCCACC TGATTGCAGA GTCCCCAGAA GATGCCAGCC AGTGGTTCAG CGTGCTGAGT 414 0 

CAGGTCCACG CGTCCACGGA CCAGGAGATC CAGGAGATGC ATGATGAGCA GGG^^ACCCA 42 00 

60 CAGAATGCTG TGGGCACCTT GGATGTGGGG CTGATTGATT CTGTGTGTGC CTC~ 3ACAGC 4260 

CCTGATAGAC CCAACTCGTT TGTGATCATC ACGGCCAACC GGGTGCTGCA CTGCAACGCC 4320 

GACACGCCGG AGGAGATGCA C CACTGGAT A ACCCTGCTGC AGAGGTCCAA AGGGGACACC 4380 

AGAGTGGAGG GCCAGGAATT CATCGTGAGA GGATGGTTGC ACAAAGAGGT GAAGAACAGT 444 0 

CCGAAGATGT CTTCACTGAA ACTGAAGAAA CGGTGGTTTG TACTCACCCA CAATTCCCTG 4500 

65 GATTACTACA AGAGTTCAGA GAAGAACGCG CTCAAACTGG GGACCCTGGT CCTCAACAGC 4 560 

CTCTGCTCTG TCGTCCCCCC AGATGAGAAG ATATTCAAAG AGACAGGCTA CTGGAACGTC 4 620 

ACCGTGTACG GGCGCAAGCA CTGTTACCGG CTCTACACCA AGCTGCTCAA CGAGGCCACC 4 680 

CGGTGGTCCA GTGCCATTCA AAACGTGACT GACACCAAGG CCCCGATCGA CACCCCCACC 4 74 0 
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CAGCAGCTGA TTCAAGATAT CAAGGAGAAC 
TACAAGCGGA ACCCGATCCT TCGATACACC 
CTTCCGTATG GGGACATAAA TCTCAACTTG 
GATGAGGCCA TCAAGATATT CAATTCCCTG 
5 CCAATAATCC AGGGCATCCT ACAGACAGGG 
TACTGCCAGC TTAT CAAACA GACCAACAAA 
TACAGCTGGC AGAT CCTGAC ATGCCTGAGC 
AAGTAT CTCA AGTTCCATCT GAAAAGGATA 
AAATACGCTC TCTTCACTTA CGAATCTCTT 
10 TCCCGAGATG AAATAGAAGC TCTGATCCAC 
CATGGCGGCG GCTCCTGCAA GATCACCATC 
GAGAAGCTGA TCCGAGGCCT GGCCATGGAG 
TACAACGGCC ACGTCGACAA AGCCATTGAA 
AAGTTTGAAA AGCTGGCTGC CACATCCGAG 
15 AAACTTTACT GCTTCCTGGA CACAGACAAC 
ATGTTTGAAC AGGCCCACGA AGCGGTTATC 
CTCCAGGTTC TTGCTGCCCT GCGACTCCAG 
GCCATCCCAC CTCTCGAAGA GGTTTATTCC 
TCAACCAAAA CCTTCACCCC TTGTGAACGG 
20 GGGACCCTGA GGCGGAGCTT CCGGACAGGA 
CAGATGCTGG ACATGTGGAT TAAGGAAGAA 
ij AAGTGGAGGA AATTTCAGGG AATGAACCAG 

Q ATCAAGGAGT GGCCTGGCTA TGGCTCGACG 

ff j. TTCCCTCAGG AACTCTGGTT GGGTGTCAGC 

V^S GAGGGAAGAC CACTGGAAGT CTTCCAGTAT 
lZ GCGAATACGT ATAAGATCGT GGTCGATGAG 

.ys GTGGATGTGG CCAAGCTCAT GAAAGCCTAC 

yl ACGACACGCT CCGCCAGCAG CCAGGGCAGC 

p TCTTTGCTAC CTGAACGCAC CACCCTCTGG 

J3 0 CCAAAACAAA CACAGAGCTG CCCAGGCTTT 
* _ s CCGAGGATCC TTTTGCCTGC CGCCTTCATT 

TCTGCACAGT TTCCAAAGCT TTACTACTCT 
AGGAACCACG CTGCCACCAA AGCAG CCGGA 
O CGACCGTAAC TGTGCTACTG AAGGGAACTG 

fJ35 AGCGTGGAAG GGGGGCATTC TCTGTCAATG 
JS* AATCTGAGGG AAGGTGAGGG AGTGGGAAGG 

!=f TGAGCTGGAG TGCTGCGGGC AGCCTTTCTC 

H GTTTCATCTT TTAAGTGTAC GTGCTTGCCT 

TTAATCATGG TTTCATGAGC ATTAAAAAGC 
40 CAGTCTGTAT ATTTTAATAA TGCAGAG CTA 
TTATTAACAA ACCCAAATCC TGGATTTTCC 
TGACTCCATT GTTTTACATG TAGCAAAGTC 
ATAAGCAGCC TACAAGATAA CTGTATTTAT 
TGGTTTTAGA ACAAGAATGA AGTCATTTTG 
4 5 AAAACAAAGT GTTACTTGGA AGGTTAGCTT 
AACCATGTTG ACTATGGGGG AGAGACGCTG 
ATCTTCAAAG GACCCTGACA TTAAATGCTG 
GTTTATAATG GTGGTCTGAA CAAGGCACCT 
AAAATAATCT GGTCTTGGAC TTTTTATTTT 
50 ACTAAGTCTA CCCACACGAA AAAAGAAATT 
AACTGTTTGT TGGCTCACAG AAGTTCTGAC 



TGCCTGAACT CGGATGTGGT GGAACAGATT 4 800 

CATCACCCCT TGCACTCCCC GCTCCTGCCC 4860 

CTCAAAGACA AAGGCTATAC CACCCTTCAG 4920 

CAGCAACTGG AGTCCATGTC TGACCCAATT 4980 

CATGACCTGC GACCTCTGCG GGACGAGCTG 5040 

GTGCCCCACC CCGGCAGTGT GGGCAACCTG 5100 

TGCACCTTCC TGCCGAGTCG AGGGATTCTC 5160 

CGGGAACAGT TTCCAGGAAC CGAGATGGAA 5220 

AAGAAAACCA AATGCCGAGA GTTTGTGCCT 5280 

AGGCAGGAAA TGACATCCAC GGTCTATTGC 534 0 

AACTCCCACA CCACTGCTGG GGAGGTGGTG 54 00 

GACAGCAGGA ACATGTTTGC TTTGTTTGAA 5460 

AGTCGAACCG TCGTAGCTGA TGTCTTAGCC 5520 

GTTGGGGACC TGCCATGGAA ATTCTACTTC 5580 

GTGCCAAAAG ACAGTGTGGA GTTTGCATTT 5640 

CATGGCCACC ATCCAGCCCC GGAAGAAAAC 5700 

TATCTGCAGG GGGATTATAC TCTGCACGCT 5760 

CTGCAGAGAC TCAAGGCCCG CATCAGCCAG 582 0 

CTGGAGAAGA GGCGGACGAG CTTCCTAGAG 5880 

TCCGTGGTCC GGCAGAAGGT CGAGGAGGAG 5940 

GTCTCCTCTG CTCGAGCCAG TATCATTGAC 6000 

GAACAGGCCA TGGCCAAGTA CATGGCCTTG 6060 

CTGTTTGATG TGGAGTGCAA GGAAGGTGGC 612 0 

GCGGACGCCG TCTCCGTCTA CAAGCGTGGA 6180 

GAACACATCC TCTCTTTTGG GGCACCCCTG 6240 

AGGGAGCTGC TCTTTGAAAC CAGTGAGGTG 63 00 

ATCAGCATGA TCGTGAAGAA GCGCTACAGC 63 60 

TCCAGGTGAA GGCGGGACAG AGCCCACCTG 6420 

CCTAGGCTGG CTCCAGTGTG CCATGCCCAG ,6480 

CTGGAAGCTT CTGGTCTGAG GGAGGTGTCT 6540 

GATCCTGTAT TAAGCTGTCA ACTTTAACAG 6600 

TAGAGGACAC ATGCCTTAAA AAAGGAGGGG 6660 

AGTGCCTTAA CTTGTGGAAC CAACACTAAT 6720 

CCTTTCCCCC TTCTGGGGGA GACTTAACAG 6780 

ATGCACTAAC CTCCCAACCT GATTTCCCCG 6840 

GGGATGGAGA GCTCGAGGGG ACAGTGTGTT 6900 

ATGGAATGAC ATGAATCAAC TTTTTTCTTT 6960 

GTTCGTGCAT GTGTTCATAA ACTCAACACT 7020 

AAAGGGAAAA AGGATGTGTA ATGGTGTACA 7080 

TAGTCTCAAT TGTTACTTTA TAAGGTGGTT 7140 

TGTCTTTGCT GTATTTTGAA AAACACGTGT 7200 

TGCCATCTGT GTCTGCTGTA TTATAAACAG 7260 

AAACCACTCT TCAACAGCTG GCTCCAGTGC 732 0 

GAGTCTTTCA TGTCTAAAAG ATTTAAGTTA 7380 

CTATCATTCT GGATAGATTA CAGATATAAT 7440 

CATTCCAGAA ACGT CTTAAC ACTTGAGTGA 7500 

AGGCTTTAAT ACACACATAT TTTATCCCAA 7560 

GTAAATAAAT CAGCATTTAT GACCAGAAGA 7620 

TATATGGAAA AGTTTTAAGG ACTTGGGCCA 7680 

TGCCTTGTCC CTTTGTGTAC AACCATGCAA 774 0 
AATAAAAGAT ACTAGCT 



ACC3 DNA sequence 
55 Gene name: calcitonin receptor-like { CALCRL ) 
Unigene number: Hs . 152 175 
Probeset Accession #: L76380 
Nucleic Acid Accession #: NM_005795 

Coding sequence: 555-1940 (predicted start/ stop codons underlined) 

60 "'. 

GCACGAGGGA ACAACCTCTC TCTCTSCAGC AGAGAGTGTC ACCTCCTGCT TTAGGACCAT 60 
CAAGCTCTGC TAACTGAATC TCATCCTAAT TGCAGGATCA CAT TGCAAAG CTTTCACTCT 120 
TTCCCACCTT GCTTGTGGGT AAATCTCTTC TGCGGAATCT CAGAAAGTAA AGTTCCATCC 180 
TGAGAATATT TCACAAAGAA TTTCCTTAAG AGCTGGACTG GGTCTTGACC CCTGGAATTT 240 
6 5 AAGAAATTCT TAAAGACAAT GTCAAATATG ATCCAAGAGA AAATGTGATT TGAGTCTGGA 3 00 
GACAATTGTG CATATCGTCT AATAATAAAA ACCCATACTA GCCTATAGAA AACAATATTT 3 60 
GAATAATAAA AACCCATACT AGCCTATAGA AAACAATATT TGAAAGATTG CTACCACTAA 420 
AAAGAAAACT ACTACAACTT GACAAGACTG CTGCAAACTT CAATTGGTCA C C AC AACTTG 4 80 
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ACAAGGTTGC 
ATTTGGGCTT 
TTATGATTCT 
TTACTAGAAA 
5 CCATTCAACA 
ACGATGTTGC 
ATCCATCAGA 
CAAGCAACAG 
AGACTGCACT 

10 TGCTTATCTC 
TACACAAAAA 
CTGCAGTGGC 
AGTTCATTCA 
ACCTACACAC 

15 ATTTTCTTGG 
TATATTACAA 
GCCCAATTTG 
TCATCACCAA 
GAGCTACTCT 

20 CTGAAGGAAA 

* AGGGTCTTTT 
1 GAAGAAACTG 
"I TTCGTAGTGC 

GTCCTAGTGA 
CAGAAAATTT 
AACTCAAGGA 
GGGAATGTCA 
ATCCAGCTCT 
CACTATGCCT 
3 0 ACAATCAACT 
AAATGGCTGT 
GAC CTAGCT A 
TCCCATCTTG 
TAACTACCCT 
5j3 5 CTATGAAAAG 
J ATCTTGTGGC 
J TTCTATATCA 

* TGTCTTACCA 
TCTACTGTAT 

40 ATTTTCTTGG 
TTTATTTTAT 
AATGCAACAA 
AATAGAGTCT 
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TATAAAACAA 
AATGATGGAG 
TGTTACAGCA 
TAAAATCATG 
AGCAGAAGGC 
AGCAGGAACT 
AAAAGTTACA 
AACATGGACA 
AAATTTGTTT 
GCTTGGCATA 
TCTGTTCTTC 
CAACAACCAG 
TCTTTACCTG 
ACTCATTGTG 
CTGGGGATTT 
TGACAATTGC 
TGCTGCTTTA 
GTTAAAAGTT 
TATCTTGGTG 
GATTGCAGAG 
GGTCTCTACC 
GAATCAATAC 
GTCTTACACA 
ACACTTAAAT 
ATATAATTGA 
CTTGGACCCA 
TAAAGAAGAG 
ATGTGGGAAA 
GATGTGACGC 
TTTCTGAGCT 
AAAACTAAAC 
AGGTCTATAA 
ATTGGGGCAG 
CTCAAATGGA 
CAACTGAGTA 
ATATCCATTG 
TTAGGAAAAC 
AACAGTGGGA 
AAACAAATTA 
AATTTTGTAA 
AGTCTCAAAT 
TGTGTGTATG 
GGAATGCT 



GATTGCTACA 
AAAAAGTGTA 
GAATTAGAAG 
ACAGC TCAAT 
GTTTACTGCA 
GAATCAATGC 
AAGATCTGTG 
AATTATACCC 
TACCTGACCA 
TTCTTTTATT 
TCATTTGTTT 
GCCTTAGTAG 
ATGGGCTGTA 
GTGGCCGTGT 
CCACTGATTC 
TGGATCAGTT 
CTGGTGAATC 
ACACACCAAG 
CCATTGCTTG 
GAGGTATATG 
ATTTTCTGCT 
AAAATCCAAT 
GTGTCAACAA 
GGA&AAAGCA 
AAATAGAAGG 
TGACTCTGTA 
CCTTCACATG 
AAAGAAATCC 
TACTAACCTG 
GGTGTAAGCC 
ATACATGTTG 
ACATGAAGGG 
TTGACTTTTT 
CAATACCAGA 
CAATTGTTAT 
TGGAAACTGG 
ATCTTAGTTG 
GGGAATTCCT 
GCAAT CATTT 
AAAGAAATTG 
CAAATACATA 
TTAATATCTG 



ACTTCTAGTT 
CCCTGTATTT 
AGAGTCCTGA 
ATGAATGTTA 
ACAGAACCTG 
AGCTCTGCCC 
ACCAAGATGG 
AGTGTAATGT 
TAATTGGACA 
TCAAGAGCCT 
GTAACTCTGT 
CCACAAATCC 
ATTACTTTTG 
TTGCAGAGAA 
CTGCTTGTAT 
CTGATACCCA 

CGGAATCCAA 
GCATTGAATT 
ACTACATCAT 
TCTTTAATGG 
TTGGAAACAG 
TCAGTGATGG 
TCCATGATAT 
ATGGTTGTCT 
GCCAGAAGAC 
AAATTAGTAG 
TGGTTTGTAA 
AC AT CACCAA 
AGTTCCAGCA 
GGCATGATTC 
AAAATTAGCT 
TTTTTTCCCA 
AGTGAATTAT 
GATCTACTCA 
ATGAACAGGA 
ATGCTACAAA 
AG CTGTAAAT 
TATATAAAGA 
TGAAAAATGA 
CAACCTATGT 
ATACTGTATC 



TATGTTATAC 
TCTGGTTCTC 
GGACTCAATT 
CCAAAAGATT 
GGATGGATGG 
TGATTACTTT 
AAACTGGTTT 
TAACACCCAC 
CGGATTGTCT 
AAGTTGCCAA 
TGTAACAATC 
TGTTAGTTGC 
GATGCTCTGT 
GCAACATTTA 
ACATGCCATT 
TCTCCTCTAC 
GTTAAATATT 
TCTGTACATG 
TGTGCTGATT 
GCACATCCTT 
AGAGGTTCAA 
CTTTTCCAAC 
TCCAGGTTAT 
TGAAAATGTT 
CACTGTTTGG 
TTCAATATTA 
TGTGTTGATA 
TGTTTGTCAG 
GTGTGGAATT 
CACCATTGAT 
TACCCTTATT 
TTTAGTTTTA 
GAGTGCCGTA 
CCCTGCTGGC 
TTTGCTGACA 
TGTATAATAT 
ACACCTTGTC 
ATAAATTTTG 
AAATCAATGA 
GCTTGTAAAT 
AATTTTTAAA 
TGGGCTGATT 



AGCATATTTC 54 0 

TTGCCTTTTT 600 

CAGTTGGGAG 660 

ATGCAAGACC 720 

CTCTGCTGGA 780 

CAGGACTTTG 840 

AGACATCCAG 900 

GAGAAAGTGA 960 

ATTGCATCAC 1020 

AGGATTACCT 1080 

ATTCACCTCA 1140 

AAAGTGTCCC 1200 

GAAGGCATTT 1260 

ATGTGGTATT 1320 

GCTAGAAGCT 1380 

ATTATCCATG 144 0 

GTACGCGTTC 1500 

AAAGCTGTGA 1560 

CCATGGCGAC 1620 

ATGCACTTCC 1680 

GCAATTCTGA 1740 

TCAGAAGCTC 1800 

AGTCATGACT 1860 

CTCTTAAAAC 1920 

TGCTTCTCCT 1980 

AATGACTTTG 204 0 

AGAGTGTAAC 2100 

TAAATACTCC 2160 

GGAGAAAAGC ,2220 

GAATTCAAAC '2280 

CSCCCCAAGA 234 0 

AAACTCTTTA 2400 

GTCCTTTTTG 2460 

TTTCTTTTCT 2520 

CATCAGTTAT 2580 

GCAATCTTAC 2640 

AACCTCTTCC 2700 

CCCTTCCATT 2760 

AGGATTTCTT 2820 

ACTCCATTAT 2880 

GCAAATATAT 2940 

TTTTAAATAA 3000 



45 
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ACC4 DNA sequence 

Gene name: Homo sapiens mRNA; cDNA DKFZp586E1624 
Unigene number: Hs.94 030 
Probeset Accession #: AA452000 
Nucleic Acid Accession #: AL110152.1 

Coding sequence: no ORF identified, possible frameshifts 



ACGCGTCCGA 
AAAAGAATTA 

55 TTTTCTGATT 
ATTCTCCCTT 
TTACCATGGG 
AATTTAAGAC 
TCTCGGTAGA 

60 TTCAAATTCC 
AATGATATTT 
TTATTTTTAT 
AGTAAGTGTA 
TTC TAGCCAG 

65 TTACAAAATA 
ATGATTTCTG 
AATAGGGATA 
GTCTTGAGGC 



AGACATTAAG 
TTTTATTAAC 
TGCTTTATTG 
GGCAAGATTT 
GCAAGGTGCC 
ACTTATAGTA 
GGCTT CTGTC 
AGTAAGGCAT 
ATGTTAATAT 
TACTACTTTG 
ACT TTTAAAG 
TGAGTTGTGT 
TGTTGTCATT 
AGTTTCTTAC 
ATATTGATAT 
CAAGAT TTAC 



TAAAAAATTG 
CTGCTGGCAT 
AATGATTGAA 
CTCCCTATGA 
ATGATGTATT 
AGTGGACTCA 
<J5ACAGGCAG 
* jCACTTTTA 
1 AAATATCTT 
AATAGAGGAC 
TAAGTATATA 
TTTCATGTCT 
TTCATTTCAG 
TGCAAAGAAC 
AT CTGT TGCT 
CACGTTTGCC 



GAACTATGAT 
ATAATCTGGA 
TACTCATTTC 
GGGTAGTTAT 
CTTGGGTGCA 
TTCATAGATG 
AAGAGTGTAT 
AGAAATTAGA 
ATGTTACACT 
CATTATCCTT 
TCAGTGAGAG 
CATCAAAAGA 
TTGTAACATA 
AGTTATAAAT 
ACATATTTAA 
CAGTGTATTG 



TTTTCTTTGT 
GTTCTTTTCA 
TTTCTAAAAA 
TATTTGAGT C 
TTGGTTTTTT 
AGTTTCAGAA 
TCCTCACTTT 
ATTTTTCTAT 
GGGAGTAATT 
CTTTCTTCAG 
TAGGCTTGTT 
CAATACCACA 
GGAAAATAGA 
TGGTATACAT 
GAATCATTCT 
AATTGGTGGT 



CATTTTTTAA 6 0 

CAACCTTACT 120 

TATGTTGTAA 180 

TGCCAAGTGG 240 

GCGCAT TGTA 3 00 

CCTTTTACGT 36 0 

TTTTTTTGTC 4 2 0 

CATCTATGCA 48 0 

TGAGGTGCAA 54 0 

AAAACTAAGA 600 

TTACAACTAT 660 

TTGCATCATT 72 0 

TATTTCCTAG 780 

GTGTCTCTGT 84 0 

ATCTTATGTT 900 

AGAAGGTAGT 960 
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TCCATGTTCC ATTTGTAGAT CTTTAAGATT TTAT CTTTGA TAACTTTAAT AGAATGTGGC 1020 

TCAGTTCTGG TCCTTCAAGC CTGTATGGTT TGGATTTTCA GTAGGGGACA GTTGATGTGG 1080 

AGTCAATCTC TTTGGTACAC AGGAAGCTTT ATAAAATTTC ATTCACGAAT CTCTTATTTT 1140 

GGGAAGCTGT TTTGCATATG AGAAGAACAC TGTTGAAATA AGGAACTAAA GCTTTATATA 1200 

5 TTGATCAAGG TGATTCTGAA AGTTTTAATT TTTAATGTTG TAATGTTATG TTATTGTTAA 1260 

TTGTACTTTA TTATGTATTC AATAGAAAAT CATGATTTAT TAATAAAAGC TTAAATTCTC 1320 
ATCTAAAAAA AAAAAAAAAA A 

10 ACC5 DNA sequence 

Gene name: Selectin E (endothelial adhesion molecule 1} 

Unigene number: Hs.89546 

Probeset Accession #: M24 736 

Nucleic Acid Accession #: NM_000450 
15 Coding sequence: 117-1949 (predicted start/stop codons underlined) 

CCTGAGACAG AGGCAGCAGT GATACCCACC TGAGAGATCC TGTGTTTGAA CAACTGCTTC 60 

CCAAAACGGA AAGTATTTCA AGCC TAAACC TTTGGGTGAA AAGAACTCTT GAAGTC ATGA 120 

TTGCTTCACA GTTTCTCTCA GCTCTCACTT TGGTGCTTCT CATTAAAGAG AGTGGAGCCT 180 

?Q GGTCTTACAA CACCTCCACG GAAGCTATGA CTTATGATGA GGCCAGTGCT TATTGTCAGC 240 

AAAGGTACAC ACACCTGGTT GCAATTCAAA ACAAAGAAGA GATTGAGTAC CTAAACTCCA 300 

TATTGAGCTA TTCACCAAGT TATTACTGGA TTGGAATCAG AAAAGTCAAC AATGTGTGGG 360 

Q TCTGGGTAGG AACCCAGAAA CCTCTGACAG AAGAAGCCAA GAACTGGGCT CCAGGTGAAC 420 

f{t CCAACAATAG GCAAAAAGAT GAGGACTGCG TGGAGATCTA CATCAAGAGA GAAAAAGATG 480 

J "!25 TGGGCATGTG GAATGATGAG AGGTGCAGCA AGAAGAAGCT TGCCCTATGC TACACAGCTG 540 

CCTGTACCAA TACATCCTGC AGTGGCCACG GTGAATGTGT AGAGACCATC AATAATTACA 600 

01 CTTGCAAGTG TGACCCTGGC TTCAGTGGAC TCAAGTGTGA GCAAATTGTG AACTGTACAG 660 

01 CCCTGGAATC CCCTGAGCAT GGAAGCCTGG TTTGCAGTCA CCCACTGGGA AACTTCAGCT 720 

fl ACAATTCTTC CTGCTCTATC AGCTGTGATA GGGGTTACCT GCCAAGCAGC ATGGAGACCA 780 



. 30 TGCAGTGTAT GTCCTCTGGA GAATGGAGTG CTCCTATTCC AGCCTGCAAT GTGGTTGAGT 840 

:.. GTGATGCTGT GACAAATCCA GCCAATGGGT TCGTGGAATG TTTCCAAAAC CCTGGAAGCT 900 

TCCCATGGAA CACAACCTGT ACATTTGACT GTGAAGAAGG ATTTGAACTA ATGGGAGCCC 960 

PU AGAGCCTTCA GTGTACCTCA TCTGGGAATT GGGACAACGA GAAGCCAACG TGTAAAGCTG 1020 

O TGACATGCAG GGCCGTCCGC CAGCCTCAGA ATGGCTCTGT GAGGTGCAGC CATTCCCCTG 1080 

ffB5 CTGGAGAGTT CACCTTCAAA TCATCCTGCA ACTTCACCTG TGAGGAAGGC TTCATGTTGC 1140 

% AGGGACCAGC- CCAGGTTGAA TGCACCACTC AAGGGCAGTG GACACAGCAA ATCCCAGTTT 1200 

^ GTGAAGCTTT CCAGTGCACA GCCTTGTCCA ACCCCGAGCG AGGCTACATG AATTGTCTTC 1260 

IH 5 CTAGTGCTTC TGGCAGTTTC CGTTATGGGT CCAGCTGTGA GTTCTCCTGT GAGCAGGGTT 1320 

TTGTGTTGAA GGGATCCAAA AGGCTCCAAT GTGGCCCCAC AGGGGAGTGG GACAACGAGA 1380 

4 0 AGCCCACATG TGAAGCTGTG AGATGCGATG CTGTCCACCA GCCCCCGAAG GGTTTGGTGA 1440 

GGTGTGCTCA TTCCCCTATT GGAGAATTCA CCTACAAGTC CTCTTGTGCC TTCAGCTGTG 1500 

AGGAGGGATT TGAATTATAT GGATCAACTC AACTTGAGTG CACATCTCAG GGACAATGGA 1560 

CAGAAGAGGT TCCTTCCTGC CAAGTGGTAA AATGTTCAAG CCTGGCAGTT CCGGGAAAGA 1620 

TCAACATGAG CTGCAGTGGG GAGCCCGTGT TTGGCACTGT GTGCAAGTTC GCCTGTCCTG 1680 

45 AAGGATGGAC GCTCAATGGC TCTGCAGCTC GGACATGTGG AGCCACAGGA CACTGGTCTG 1740 

GCCTGCTACC TACCTGTGAA GCTCCCACTG AGTCCAACAT TCCCTTGGTA GCTGGACTTT 1800 

CTGCTGCTGG ACTCTCCCTC CTGACATTAG CACCATTTCT CCTCTGGCTT CGG AAATGCT 1860 

TACGGAAAGC AAAGAAATTT GTTCCTGCCA GCAGCTGCCA AAGCCTTGAA TCAGACGGAA 1920 

GCTACCAAAA GCCTTCTTAC ATCCTT TAAG TTCAAAAGAA TCAGAAACAG GTGCATCTGG 1980 

50 GGAACTAGAG GGATACACTG AAGTTAACAG AGACAGATAA CTCTCCTCGG GTCTCTGGCC 2 040 

CTTCTTGCCT ACTATGCCAG ATGCCTTTAT GGCTGAAACC GCAACACCCA TCACCACTTC 2100 

AATAGATCAA AGTCCAGCAG GCAAGGACGG CCTTCAACTG AAAAGACTCA GTGTTCCCTT 2160 

TCCTACTCTC AGGATCAAGA AAGTGTTGGC TAATGAAGGG AAAGGATATT TTCTTCCAAG 2220 

CAAAGGTGAA GAGACCAAGA CTCTGAAATC TCAGAATTCC TTTTCTAACT CTCCCTTGCT 22 80 

55 CGCTGTAAAA TCTTGGCACA GAAACACAAT ATTTTGTGGC TTTCTTTCTT TTGCCCTTCA 234 0 

CAGTGTTTCG ACAGCTGATT ACACAGTTGC TGTCATAAGA ATGAATAATA ATTATCCAGA 2400 

GTTTAGAGGA AAAAAATGAC T AAAAAT AT T ATAACTTAAA AAAATGACAG ATGTTGAATG 2460 

CCCACAGGCA AATGCATGGA GGGTTGTTAA TGGTGCAAAT CCTACTGAAT GCTCTGTGCG 2520 

AGGGTTACTA TGCACAATTT AATCACTTTC ATCCCTATGG 1ATTCAGTGC TTCTTAAAGA 2580 

60 GTTCTTAAGG ATTGTGATAT TTTTACTTGC ATTGAATATA *" JTATAATCTT CCATACTTCT 2640 

TCATTCAATA CAAGTGTGGT AGGGACTTAA AAAAC TTGT A AATGCTGTCA ACTATGATAT 2700 

GGTAAAAGTT ACTTATTCTA GATTACCCCC TCATTGTTTA TTAACAAATT ATGTTACATC 2760 

TGTTTTAAAT TTATTT CAAA AAGGGAAACT ATTGTCCCCT AGCAAGGCAT GATGTTAACC 282 0 

AGAATAAAGT TCTGAGTGTT T TT ACT ACAG TTGTTTTTTG AAAACATGGT AGAATTGGAG 288 0 

65 AGTAAAAACT GAATGGAAGG TTTGTATATT GT C AG AT ATT TTTT C AGAAA TATGTGGTTT 2 94 0 

CCACGATGAA AAACTTCCAT GAGGCCAAAC GTTTTGAACT AATAAAAGCA TAAATG CAAA 3000 

CACACAAAGG TATAATTTTA TGAATGTCTT TGTTGGAAAA GAATACAGAA AGATGGATGT 3 060 

GCTTTGCATT CCTACAAAGA TGTTTGTCAG ATGTGATATG TAAACATAAT TCTTGTATAT 3120 
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TATGGAAGAT TTTAAATTCA CAATAGAAAC TCACCATGTA AAAGAGTCAT CTGGTAGATT 3180 

TTTAACGAAT GAAGATGTCT AATAGTTATT CCCTATTTGT TTTCTTCTGT ATGTTAGGGT 3240 

GCTCTGGAAG AGAGGAATGC CTGTGTGAGC AAGCATTTAT GTTTATTTAT AAGCAGATTT 3300 

AACAATTCCA AAGGAATCTC CAGTTTTCAG TTGATCACTG GCAATGAAAA ATTCTCAGTC 3360 

5 AGTAATTGCC AAAGCTGCTC TAGCCTTGAG GAGTGTGAGA ATCAAAACTC TCCTACACTT 3420 

CCATTAACTT AGCATGTGTT GAAAAAAAAA GTTTCAGAGA AGTTCTGGCT GAACACTGGC 34 80 

AACGACAAAG C CAACAGT C A AAACAGAGAT GTGATAAGGA TCAGAACAGC AGAGGTTCTT 354 0 

TTAAAGGGGC AGAAAAACTC TGGGAAATAA GAGAGAACAA CTACTGTGAT CAGGCTATGT 3600 

ATGGAATACA GTGTTATTTT CTTTGAAATT GTTTAAGTGT TGTAAATATT TATGTAAACT 3660 

10 GCATTAGAAA TTAGCTGTGT GAAATACCAG TGTGGTTTGT GTTTGAGTTT TATTGAGAAT 372 0 

TTTAAATTAT AACTTAAAAT ATTTTATAAT TTTTAAAGTA TATATTTATT TAAGCTTATG 3780 
TCAGACCTAT TTGACATAAC ACTATAAAGG TTGACAATAA ATGTGCTTAT GTTT 

15 ACC8 DNA sequence 

Gene name: Chemokine (C-X-C motif) , receptor 4 (fusin) 
Unigene number: Hs. 89414 
Probeset Accession #: L06797 
Nucleic Acid Accession #: NM_003467 
HSO Coding sequence: 8 9-1147 {predicted start/ stop codons underlined) 



;: : 



GTTTGTTGGC TGCGGCAGCA GGTAGCAAAG TGACGCCGAG GGCCTGAGTG CTCCAGTAGC 60 

m CACCGCATCT GGAGAACCAG CGGTTACC AT G GAGGGGATC AGTATATACA CTTCAGATAA 120 

!"f CTACACCGAG GAAATGGGCT CAGGGGACTA TGACTCCATG AAGGAACCCT GTTTCCGTGA 180 

525 AGAAAATGCT AATTTCAATA AAATCTTCCT GCCCACCATC TACTCCATCA TCTTCTTAAC 240 

IP TGGCATTGTG GGCAATGGAT TGGTCATCCT GGTCATGGGT TACCAGAAGA AACTGAGAAG 300 

g! CATGACGGAC AAGTACAGGC TGCACCTGTC AGTGGCCGAC CTCCTCTTTG TCATCACGCT 360 

fi TCCCTTCTGG GCAGTTGATG CCGTGGCAAA CTGGTACTTT GGGAACTTCC TATGCAAGGC 42 0 

AGTCCATGTC AT CT AC AC AG TCAACCTCTA CAGCAGTGTC CTCATCCTGG CCTTCATCAG 48 0 

■ .30 TCTGGACCGC TACCTGGCCA TCGTCCACGC CACCAACAGT CAGAGGC CAA GGAAGCTGTT ' 540 

N GGCTGAAAAG GTGGTCTATG TTGGCGTCTG GATCCCTGCC CTCCTGCTGA CTATT CCCGA 6T)0 

[U CTTCATCTTT GCCAACGTCA GTGAGGCAGA TGACAGATAT ATCTGTGACC GCTTCTACCC 660 

p CAATGACTTG TGGGTGGTTG TGTTCCAGTT TCAGCACATC ATGGTTGGCC TTATC CTGC C 720 

m TGGTATTGTC ATCCTGTCCT GCTATTGCAT TATCATCTCC AAGCTGTCAC ACTCCAAGGG 780 

JL35 CCACCAGAAG CGCAAGGCCC TCAAGACCAC AGTCATCCTC ATCCTGGCTT TCTTCGCCTG 840 

TTGGCTGCCT TACTACATTG GGATCAGCAT CGACTCCTTC ATCCTCCTGG AAATCATCAA 900 

GCAAGGGTGT GAGTTTGAGA ACACTGTGCA CAAGTGGATT TCCATCACCG AGGCCCTAGC 960 

TTTCTTCCAC TGTTGTCTGA ACCCCATCCT CTATGCTTTC CTTGGAGCCA AATTTAAAAC 1020 

CTCTGCCCAG CACGCACTCA CCTCTGTGAG CAGAGGGTCC AGCCTCAAGA TCCTCTCCAA 1080 

40 AGGAAAGCGA GGTGGACATT CATCTGTTTC CACTGAGTCT GAGTCTTCAA GTTTTCACTC 1140 

CAGCTAACAC AGATGTAAAA GACTTTTTTT TATACGATAA ATAACTTTTT TTTAAGTTAC 1200 

ACATTTTTCA GATATAAAAG ACTGACCAAT ATTGTACAGT TTTTATTGCT TGTTGGATTT 1260 

TTGTCTTGTG TTTCTTTAGT TTTTGTGAAG TTTAATTGAC TTATTTATAT AAATTTTTTT 1320 

TGTTT CAT AT TGATGTGTGT CTAGGCAGGA CCTGTGGCCA AGTTCTTAGT TGCTGTATGT 1380 

45 CTCGTGGTAG GACTGTAGAA AAGGGAACTG AACATTCCAG AGCGTGTAGT GAATCACGTA 1440 

AAG CTAGAAA TGATCCCCAG CTGTTTATGC AT AG AT AAT C TCTCCATTCC CGTGGAACGT 1500 

TTTTCCTGTT CTTAAGACGT GATTTTGCTG TAGAAGATGG CACTTATAAC CAAAGCCCAA 1560 

AGTGGTATAG AAATGCTGGT TTTTCAGTTT TCAGGAGTGG GTTGATTTCA GCAC CTAC AG 1620 
TGTACAGTCT TGTATTAAGT TGTTAATAAA AGTACATGTT AAACTTACTT AGTGTTATG 

50 

ACF2 DNA sequence 

Gene name: Endothelial cell -specific molecule 1 
Unigene number: Hs.41716 
55 Probeset Accession #: X89426 

Nucleic Acid Accession #: NM_007036 

Coding sequence: 56-610 (predicted start/stop codons underlined) 

CTTCCCACCA GCAAAGACCA CGAC TGGAGA GCCGAGCCGG AGGCAGCTGG GAAACATGAA 60 

60 GAGCGTCTTG CTGCTGACCA CGCTCCTCGT GCCTGCACAC CTGGTGGCCG CCTGGAGCAA ~120 

TAATTATGCG GTGGACTGCC CTCAACACTG TGACAGCAGT GAGTGCAAAA GCAGCCCGCG 180 

CTGCAAGAGG ACAGTGCTCG ACGACTGTGG CTGCTGCCGA GTGTGCGCTG CAGGGCGGGG 240 

AGAAACTTGC TACCGCACAG TCTCAGGCAT GGATGGCATG AAGTGTGGCC CGGGGCTGAG 300 

GTGTCAGCCT TCTAATGGGG AGGATCCTTT TGGTGAAGAG TTTGGTATCT GCAAAGACTG 360 

65 TCCCTACGGC ACCTTCGGGA TGG AT TGCAG AGAGACCTGC AACTGCCAGT CAGGCATCTG 420 

TGACAGGGGG ACGGGAAAAT GCCTGAAATT CCCCTTCTTC CAATATTCAG TAACCAAGTC 480 

TTCCAACAGA TTTGTTTCTC TCACGGAGCA TGACATGGCA TCTGGAGATG GCAATATTGT 540 

GAGAGAAGAA GTTGTGAAAG AGAATG CTGC CGGGTCTCCC GTAAT GAGGA AATGGTTAAA 600 
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TCCACGCTGA TCCCGGCTGT 
CACAGCCAAC ATTTTAGGAA 
CCAAATTGTG ATGCATGGTG 
ATCCATATGA CTGAACACTT 
AAATGTGTGT GTATAGTAAC 
AGACAGGTCA ACCAAAGAGG 
TCTTTGACTT TGATGTACAT 
GATGGGGAGG GGGTGGGAGT 
TCTAGAATTT AATTGTGCTT 
ACCAGAAAAC CCCTGAAGGA 
AGCTTTGAAC TGAGAGCAAT 
TGAAGGACGG TTCTGGGGCA 
CCACCTCAGA GATAAATCTA 
GTAAATATTT ATATATTTTT 
ATTTATCATC CTCTTGAGGA 
AACCTATGAC TCTATAAGGT 
TATAGGAGTC ACTCTGGATT 
TAAACATAAG TGCTGTGACT 
GGAGGTTTGT AAAAGAAGAA 
TGGAATTAGG AGTATATTTG 
CTTAGGAAAT ATCT CAGAAG 
AGTATTTACC TGTATTTTAT 
GCCTTTGAAT GTAAAGCTGC 
TTGTTCAATA AAAAAGAACA 



ACF4 DNA sequence 

Gene name: P53 -responsive gene 2 similar to D .melanogaster peroxidasin (U11052) 
Unigene number: Hs. 118893 « 
Probeset Accession #: D86983 
Nucleic Acid Accession #: D86983 

Coding sequence: 1-4491 (predicted stop codon underlined, sequence is open at 
end) 

AGCCGGCCGT GGTGGCTCCG TGCGTCCGAG CGTCCGTCCG CGCCGTCGGC CATGGCCAAG 60 

CGCTCCAGGG GCCCCGGGCG CCGCTGCCTG TTGGCGCTCG TGCTGTTCTG CGCCTGGGGG 12 0 

ACGCTGGCCG TGGTGGCCCA GAAGCCGGGC GCAGGGTGTC CGAGCCGCTG CCTGTGCTTC 180 

CGCACCACCG TGCGCTGCAT GCATCTGCTG CTGGAGGCCG TGCCCGCCGT GGCGCCGCAG 240 

ACCTCCATCC TAGAT CTTCG CTTTAACAGA ATCAGAGAGA TCCAACCTGG GGCATTCAGG 300 

CGGCTGAGGA ACTTGAACAC ATTGCTTCTC AATAATAATC AGATCAAGAG GATACC TAGT 360 

GGAGCATTTG AAGACTTGGA AAATTTAAAA TATCTCTATC TGTACAAGAA TGAGATCCAG 420 

TCAATTGACA GGCAAGCATT TAAGGGACTT GCCTCTCTAG AGCAACTATA CCTGCACTTT 480 

AATCAGATAG AAACTTTGGA CCCAGATTCG TTCCAGCATC TCCCGAAGCT CGAGAGGCTA 54 0 

TTTTTGCATA ACAACCGGAT TACACATTTA GTTCCAGGGA CATTTAATCA CTTGGAATCT 600 

ATGAAGAGAT TGCGACTGGA CTCAAACACA CTTCACTGCG ACTGTGAAAT CCTGTGGTTG 660 

GCGGATTTGC TGAAAACCTA CGCGGAGTCG GGGAACGCGC AGGCAGCGGC CATCTGTGAA 720 

TATCCCAGAC GCATCCAGGG ACGCTCAGTG GCAACCATCA CCCCGGAAGA GCTGAACTGT 780 

GAAAGGCCCC GGATCACCTC CGAGCCCCAG GACGCAGATG TGACCTCGGG GAACACCGTG 84 0 

TACTTCACCT GCAGAGCCGA AGGCAACCCC AAGCCTGAGA TCATCTGGCT GCGAAACAAT 900 

AATGAG CTG A GCATGAAGAC AGATTCCCGC CTAAACTTGC TGGACGATGG GACCCTGATG 960 

AT CC AGAACA CACAGGAGAC AGACCAGGGT ATCTACCAGT GCATGGCAAA GAACGTGGCC 1020 

GGAGAGGTGA AGACGCAAGA GGTGACCCTC AGGTACTTCG GGTCTCCAGC TCGACCCACT 1080 

TTTGTAATCC AGCCACAGAA TACAGAGGTG CTGGTTGGGG AGAGCGTCAC GCTGGAGTGC 1140 

AGCGCCACAG GCCACCCCCC GCCGCGGATC TCCTGGACGA GAGGTGACCG CACACCCTTG 1200 

CCAGTTGACC CGCGGGTGAA CATCACGCCT TCTGGCGGGC TTTACATACA GAACGTCGTA 1260 

CAGGGGGACA GCGGAGAGTA TGCGTGCTCT GCGACCAACA ACATTGACAG CGTCCATGCC 1320 

ACCGCTTTCA TCATCGTCCA GGCTCTTCCT CAGTTCACTG TGACGCCTCA GGACAGAGTC 1380 

GTTATTGAGG GCCAGACCGT GGATTT CCAG TGTGAAGCCA AGGGCAACCC GCCGCCCGTC 144 0 

ATCGCCTC~A CCAAGGGAGG GAGCCAGCTC TCCGTGGACC GGCGGCACCT GGTCCTGTCA 1500 

TCGGGAAC C TTAGAATCTC TGGTGTTGCC CTCCACGACC AGGGCCAGTA CGAATG CC AG 1560 

GCTGTCAACA TCATCGGCTC CC AG AAGGT C GTGGCCCACC TGACTGTGCA GCCCAGAGTC 1620 

ACCCCAGTGT TTGCCAGCAT TCCCAGCGAC ACAACAGTGG AGGTGGGCGC CAATGTGCAG 1680 

CTCCCGTGCA GCTCCCAGGG CGAGCCCGAG CCAGCCATCA CCTGGAACAA GGATGGGGTT 174 0 

CAGGTGACAG AAAGTGGAAA ATTTCACATC AGCCCTGAAG GATTCTTGAC CAT C AATG AC 1800 

GTTGGCCCTG CAGACGCAGG TCGCTATGAG TGTGTGGCCC GG AAC AC CAT TGGGTCGGCC 1860 

TCGGTGAGCA TGGTGCTCAG TGTGAACGTT CCTGACGTCA GTCGAAATGG AGATCCGTTT 1920 

GTAGCTACCT CCATCGTGGA AGCGATTGCG ACTGT TGAC A GAG CT AT AAA CTCAACCCGA 19 80 

ACACATTTGT TTGACAGCCG TCCTCGTTCT CCAAATGATT TGCTGGCCTT GTTCCGGTAT 2040 



GATTTCTGAG AGAAGGCTCT ATTTTCGTGA TTGTTCAACA 660 

CTTTCTAGAT ATAGCATAAG TACATGTAAT TTTTGAAGAT 720 

GATCCAGAAA ACAAAAAGTA GGATACTTAC AATCCATAAC 780 

GTATGTGTTT GTTAAATATT CGAATGCATG TAGATTTGTT 840 

ACTGAAGAAC TAAAAATGCA ATTTAGGTAA TCTTACATGG 900 

GAGCTAGGCA AAGCTGAAGA CCGCAGTGAG TCAAATTAGT 960 

TAATGTTGGG ATATGGAATG AAGACTTAAG AGCAGGAGAA 1020 

GGGAAATAAA ATATTTAGCC CTTCCTTGGT AGGTAGCTTC 1080 

TTTTTTTTTT TTTGGCTTTG GGAAAAGTCA AAATAAAACA 1140 

AGTAAGATGT TTGAAGCTTA TGGAAAT TTG AGTAACAAAC 1200 

TTCAAAAGGC TGCTGAT GTA GTTCCCGGGT TACCTGTATC 1260 

TAGGAAACAC ATACACTTCC ATAAATAGCT TTAACGTATG 1320 

AGAAGTATTT TACCCACTGG TGGTTTGTGT GTGTATGAAG 13 80 

ATAAATAAAT GTGTTAGTGC AAGTCATCTT CCCTACCCAT 1440 

AAGAAATCTA GTATTATTTG TTGAAAATGG TTAGAATAAA 1500 

TTTCAAACAT CTGAGGCATG ATAAATTTAT TATCCATAAT 1560 

TCAAAAAATG TCAAAAAATG AGCAACAGAG GGACCTTATT 1620 

TCGGTGAATT TTCAATTTAA GGTATGAAAA TAAGTTTTTA 1680 

TCAATTTTCA GCAGAAAACA TGTCAACTTT AAAATATAGG 174 0 

AAAGAATCTT AGCACAAACA GGACTGTTGT ACTAGATGTT 1800 

TATTTTATTT GAAGTGAAGA ACTTATTTAA GAATTATTTC 1860 

TCTTGAAGTT GGCCAACAGA GTTGTGAATG TGTGTGGAAG 1920 

ATAAGCTGTT AGGTTTTGTT TTAAAAGGAC ATGTTTATTA 1980 
AGATAC 



132 



CCGAGGGATC CTTACACAGT TGAACAGGCA CGGGCGGGAG AAATCTTTGA ACGGACATTG 2100 

CAGCTCATTC AGGAGCATGT ACAGCATGGC TTGATGGTCG ACCTCAACGG AACAAGTTAC 2160 

CACTACAACG ACCTGGTGTC TCCACAGTAC CTGAACCTCA TCGCAAACCT GTCGGGCTGT 2220 

ACCGCCCACC GGCGCGTGAA CAACTGCTCG GACATGTGCT TCCACCAGAA GTACCGGACG 2280 

CACGACGGCA CCTGTAACAA CCTGCAGCAC CCCATGTGGG GCGCCTCGCT GACCGCCTTC 2340 

GAGCGCCTGC TGAAATCCGT GTACGAGAAT GGCTTCAACA CCCCTCGGGG CATCAACCCC 2400 

CACCGACTGT ACAACGGGCA CGCCCTTCCC ATGCCGCGCC TGGTGTCCAC CACCCTGATC 2460 

GGGACGGAGA CCGTCACACC CGACGAGCAG TTCACCCACA TGCTGATGCA GTGGGGCCAG 2520 

TTCCTGGACC ACGACCTCGA CTCCACGGTG GTGGCCCTGA GCCAGGCACG CTTCTCCGAC 2580 

GGACAGCACT GCAGCAACGT GTGCAGCAAC GACCCCCCCT GCTTCTCTGT CATGATCCCC 2640 

CCCAATGACT CCCGGGCCAG GAGCGGGGCC CGCTGCATGT TCTTCGTGCG CTCCAGCCCT 2700 

GTGTGCGGCA GCGGCATGAC TTCGCTGCTC ATGAACTCCG TGTACCCGCG GGAGCAGATC 2760 

AACCAGCTCA CCTCCTACAT CGACGCATCC AACGTGTACG GGAGCACGGA GCATGAGGCC 2820 

CGCAGCATCC GCGACCTGGC CAGCCACCGC GGCCTGCTGC GGCAGGGCAT CGTGCAGCGG 2880 

TCCGGGAAGC CGCTGCTCCC CTTCGCCACC GGGCCGCCCA CGGAGTGCAT GCGGGACGAG 2940 

AACGAGAGCC CCATCCCCTG CTTCCTGGCC GGGGACCACC GCGCCAACGA GCAGCTGGGC 3 000 

CTGACCAGCA TGCACACGCT GTGGTTCCGC GAGCACAACC GCATTGCCAC GGAGCTGCTC 3060 

AAGCTGAACC CGCACTGGGA CGGCGACACC ATCTACTATG AG AC CAGGAA GATCGTGGGT 3120 

GCGGAGATCC AGCACATCAC CTACCAGCAC TGGCTCCCGA AGATCCTGGG GGAGGTGGGC 3180 

ATGAGGACGC TGGGAGAGTA CCACGGCTAC GACCCCGGCA TCAATGCTGG CATCTTCAAC 3240 

GCCTTCGCCA CCGCGGCCTT CAGGTTTGGC CACACG CTTG TCAACCCACT GCTTTACCGG 3 300 

CTGGACGAGA ACTTCCAGCC CATTGCACAA GATCACCTCC CCCTTCACAA AGCTTTCTTC 3360 

TCTCCCTTCC GGATTGTGAA TGAGGGCGGC ATCGATCCGC TTCTCAGGGG GCTGTTCGGG 3420 

GTGGCGGGGA AAATGCGTGT GCCCTCGCAG CTGCTGAACA CGGAGCTCAC GGAGCGGCTG 3480 

TTCTCCATGG CACACACGGT GGCTCTGGAC CTGGCGGCCA TCAACATCCA GCGGGGCCGG 3540 

GACCACGGGA TCCCACCCTA CCACGACTAC AGGGTCTACT GCAATCTATC GGCGGCACAC 3600 

ACGTTCGAGG ACCTGAAAAA TGAGATTAAA AACCCTGAGA TCCGGGAGAA ACTGAAAAGG 3660 

TTGTATGGCT CGACACTCAA CATCGACCTG TTTCCGGCGC TCGTGGTGGA GGACCTGGTG 3720 
CCTGGCAGCC GGCTGGGCCC CACCCTGATG TGTCTTCTCA GCACACAGTT CAAGCGCCTG " 3780 

CGAGATGGGG ACAGGTTGTG GTATGAGAAC CCTGGGGTGT TCTCCCCGGC CCAGCTGACT '3840 

CAGATCAAGC AGACGTCGCT GGCCAGGATC CTATGCGACA ACGCGGACAA CATCACCCGG 3900 

GTGCAGAGCG ACGTGTTCAG GGTGGCGGAG TTCCCTCACG GCTACGGCAG CTGTGACGAG 3960 

ATCCCCAGGG TGGACCTCCG GGTGTGGCAG GACTGCTGTG AAGACTGTAG GACCAGGGGG 4 020 

CAGTTCAATG CCTTTTCCTA TCATTTCCGA GGCAGACGGT CTCTTGAGTT CAGCTACCAG 4080 

GAGGACAAGC CGACCAAGAA AACAAGACCA CGGAAAATAC CCAGTGTTGG GAGACAGGGG 4140 

GAACATCTCA GCAACAGCAC CTCAGCCTTC AGCACACGCT CAGATGCATC TGGGACAAAT 4200 

GACTTCAGAG AGTTTGTTCT GGAAATGCAG AAGAC CATCA CAGAC CTCAG AACACAGATA 4260 

AAGAAACTTG AATCACGGCT CAGTACCACA GAGTGCGTGG ATGCCGGGGG CGAATCTCAC 4320 

GCCAACAACA CCAAGTGGAA AAAAGATGCA TGCACCATTT GTGAATGCAA AGACGGGCAG 43 80 

GTCACCTGCT TCGTGGAAGC TTGCCCCCCT GCCACCTGTG CTGTCCCCGT GAACATCCCA 4440 

GGGGCCTGCT GTCCAGTCTG CTTACAGAAG AGGGCGGAGG AAAAGCCC TA G GCTCCTGGG 4500 

AGGCTCCTCA GAGTTTGTCT GCTGTGCCAT CGTGAGATCG GGTGGCCGAT GGCAGGGAGC 4560 

TGCGGACTGC AG AC CAGGAA ACACCCAGAA CTCGTGACAT TTCATGACAA CGTCCAGCTG 4620 

GTGCTGTTAC AGAAGGCAGT GCAGGAGGCT TCCAACCAGA GCATCTGCGG AGAAGGAGGC 4680 

ACAGCAGGTG CCTGAAGGGA AGCAGGCAGG AGTCCTAGCT TCACGTTAGA CTTCTCAGGT 4740 

TTTTATTTAA TTCTTTTAAA ATGAAAAATT GGTGCTACTA TTAAATTGCA CAGTTGAATC 4800 

ATTTAGGCGC CTAAATTGGT TTTGCCTCCC AACAC CATT T CTTTTTAAAT AAAGCAGGAT 4 860 

ACCTCTATAT GTCAGCCTTG CCTTGTTCAG ATGCCAGGAG CCGGCAGACC TGTCACCCGC 4920 

AGGTGGGGTG AGTCTCGGAG CTGCCAGAGG GGCTCACCGA AATCGGGGTT CCATCACAAG 4 980 

CTATGTTTAA AAAGAAAATT GGTGTTTGGC AAACGGAACA GAACCTTTGA TGAGAGCGTT 5 04 0 

CACAGGGACA CTGTCTGGGG GTGCAGTGCA AGCCCCCGGC CTCTTCCCTG GGAACCTCTG 5100 

AACTCCTCCT TCCTCTGGGC TCTCTGTAAC ATTTCACCAC ACGTCAGCAT CTAATC CCAA 5160 

GACAAACATT CCCGCTGCTC GAAGCAGCTG TATAGCCTGT GACTCTCCGT GTGTCAGCTC 5220 

CTTCCACACC TGATTAGAAC ATTCATAAGC CACATTTAGA AACAGATTTG CTTTCAGCTG 528 0 

TCACTTGCAC ACATACTGCC TAGTTGTGAA C C AAATGTG A AAAAACCTCC TTCATCCCAT 534 0 

TGTGTATCTG ATACCTGCCG AGGGCCAAGG GTGTGTGTTG ACAACGCCGC TCCCAGCCGG 5400 

CCCTGGTTGC GTCCACGTCC TGAACAAGAG CCGCTTCCGG ATGGCTCTTC CCAAGGGAGG 5460 
AGGAGCTCAA GTGTCGGGAA CTGTCTAACT TCAGGTTGTG TGAGTGCGTT 



ACF5 DNA sequence 

Gene name: Mitogen- activated protein kinase kinase kinase kinase 4 
Unigene number: Hs.3 628 
Probeset Accession #: N54067 
Nucleic Acid Accession #: NM_004834 

Coding sequence: 80-3 577 (predicted start/stop codons underlined) 
AATTCGAGGA TCCGGGTACC ATGGCACAGA GCGACAGAGA CATTTATTGT TATTTGTTTT 60 
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TTGGTGGCAA AAAGGGAA AA TG GCGAACGA CTCCCCTGCA AAAAGTCTGG TGGACATCGA 120 

CCTCTCCTCC CTGCGGGATC CTGCTGGGAT TTTTGAGCTG GTGGAAGTGG TTGGAAATGG 180 

CACCTATGGA CAAGTCTATA AGGGTCGACA TGTTAAAACG GGTCAGTTGG CAGCCATCAA 240 

AGTTATGGAT GTCACTGAGG ATGAAGAGGA AGAAATCAAA CTGGAGATAA ATATGCTAAA 3 00 

GAAATACTCT CATCACAGAA ACATTGCAAC ATATTATGGT GCTTTCATCA AAAAGAGCCC 3 60 

TCCAGGACAT GATGACCAAC TCTGGCTTGT TATGGAGTTC TGTGGGGCTG GGTC CATTAC 420 

AGACCTTGTG AAGAACACCA AAGGGAACAC ACTCAAAGAA GACTGGATCG CTTACATCTC 480 

CAGAGAAATC CTGAGGGGAC TGGCACATCT TCACATTCAT CATGTGATTC ACCGGGATAT 54 0 

CAAGGGCCAG AATGTGTTGC TGACTGAGAA TGCAGAGGTG AAACTTGTTG ACTTTGGTGT 600 

GAGTGCTCAG CTGGACAGGA CTGTGGGGCG GAGAAATACG TTCATAGGCA CTCCCTACTG 660 

GATGGCTCCT GAGGTCATCG CCTGTGATGA GAACCCAGAT GCCACCTATG ATTACAGAAG 720 

TGATCTTTGG TCTTGTGGCA TTACAGCCAT TGAGATGGCA GAAGGTGCTC CCCCTCTCTG 780 

TGACATGCAT CCAATGAGAG CACTGTTTCT CATTCCCAGA AACCCTCCTC CCCGGCTGAA 840 

GTCAAAAAAA TGGTCGAAGA AGTTTTTTAG TTTTATAGAA GGGTGCCTGG TGAAGAATTA 900 

CATGCAGCGG CCCTCTACAG AGCAGCTTTT GAAACATCCT TTTATAAGGG ATCAGCCAAA 960 

TGAAAGGCAA GTTAGAATCC AGCTTAAGGA TCATATAGAT CGTACCAGGA AGAAGAGAGG 1020 

CGAGAAAGAT GAAACTGAGT ATGAGTACAG TGGGAGTGAG GAAGAAGAGG AGGAAGTGCC 1080 

TGAACAGGAA GGAGAGCCAA GTTCCATTGT GAACGTGCCT GGTGAGTCTA CTCTTCGCCG 1140 

AGATTTCCTG AGACTGCAGC AGGAGAACAA GGAACGTTCC GAGGCTCTTC GGAGACAACA 1200 

GTTACTACAG GAGCAACAGC TCCGGGAGCA GGAAGAATAT AAAAGGCAAC TGCTGGCAGA 1260 

GAGACAGAAG CGGATTGAGC AGCAGAAAGA ACAGAGGCGA CGGCTAGAAG AGCAACAAAG 13 2 0 

GAGAGAGCGG GAGGCTAGAA GGCAGCAGGA ACGTGAACAG CGAAGGAGAG AACAAGAAGA 1380 

AAAGAGGCGT CTAGAGGAGT TGGAGAGAAG GCGCAAAGAA GAAGAGGAGA GGAGACGGGC 144 0 

AGAAGAAGAA AAGAGGAGAG TTGAAAGAGA ACAGGAGTAT ATCAGGCGAC AGCTAGAAGA 1500 

GGAGCAGCGG CACTTGGAAG TCCTTCAGCA GCAGCTGCTC CAGGAGCAGG C CATGTTACT 1560 

GCATGACCAT AGGAGGCCGC ACCCGCAGCA CTCGCAGCAG CCGCCACCAC CGCAGCAGGA 1620 

AAGGAGCAAG CCAAGCTTCC ATGCTCCCGA GCCCAAAGCC CACTACGAGC CTGCTGACCG 1680 

AGCGCGAGAG GTTCCTGTGA GAACAACATC TCGCTCCCCT GTTCTGTCCC GTCGAGATTC 1740 

CCCACTGCAG GGCAGTGGGC AGCAGAATAG CCAGGCAGGA CAGAGAAACT CCACCAGTAT 1800 

TGAGCCCAGG CTTCTGTGGG AGAGAGTGGA GAAGCTGGTG CCCAGACCTG GCAGTGGCAG 1860 

CTCCTCAGGG TCCAGCAACT CAGGATCCCA GCCCGGGTCT CACCCTGGGT CTCAGAGTGG 1920 

CTCCGGGGAA CGCTTCAGAG TGAGATCATC ATCCAAGTCT GAAGGCTCTC CATCTCAGCG 1980 

CCTGGAAAAT GCAGTGAAAA AACCTGAAGA TAAAAAGGAA GTTTTCAGAC CCCTCAAGCC 2040 

TGCTGGCGAA GTGGATCTGA CCGCACTGGC CAAAGAGCTT CGAGCAGTGG AAGATGTACG 2100 

GCCACCTCAC AAAGTAACGG ACTACTCCTC ATCCAGTGAG GAGTCGGGGA CGACGGATGA 2160 

GGAGGACGAC GATGTGGAGC AGGAAGGGGC TGACGAGTCC ACCTCAGGAC CAGAGGACAC 2220 

C AG AG CAG CG TCATCTCTGA ATTTGAGCAA TGGTGAAACG GAATC TGTGA AAACCATGAT 2280 

TGTCCATGAT GATGTAGAAA GTGAGCCGGC CATGACCCCA TCCAAGGAGG GCACTCTAAT 234 0 

CGTCCGCCAG ACTCAGTCCG CTAGTAGCAC ACTCCAGAAA CACAAATCTT CCTCCTCCTT 2400 

TACACCTTTT ATAGACCCCA GATTACTACA GATTTCTCCA TCTAGCGGAA CAACAGTGAC 24 60 

ATCTGTGGTG GGATTTTCCT GTGATGGGAT GAGACCAGAA GCCATAAGGC AAGATCCTAC 2520 

CCGGAAAGGC TCAGTGGTCA ATGTGAATCC TACCAACACT AGGCCACAGA GTGACACCCC 2580 

GGAGATTCGT AAATACAAGA AGAGGTTTAA CTCTGAGATT CTGTGTGCTG CCTTATGGGG 2640 

AGTGAATTTG CTAGTGGGTA CAGAGAGTGG CCTGATGCTG CTGGACAGAA GTGGCCAAGG 2700 

GAAGGTCTAT CCTCTTATCA ACCGAAGACG ATTTCAACAA ATGGACGTAC TTGAGGGCTT 2760 

GAATGTCTTG GTGACAATAT CTGGCAAAAA GGATAAGTTA CGTGTCTACT ATTTGTCCTG 2820 

GTTAAGAAAT AAAATACTTC ACAATGATCC AGAAGTTGAG AAGAAGCAGG GATGGACAAC 28 80 

CGTAGGGGAT TTGGAAGGAT GTGTACATTA TAAAGTTGTA AAATATGAAA GAAT CAAATT 294 0 

TCTGGTGATT GCTTTGAAGA GTTCTGTGGA AGTCTATGCG TGGGCACCAA AGCCATATCA 3000 

CAAATTTATG GCCTTTAAGT CATTTGGAGA ATTGGTACAT AAGC CATTAC TGGTGGATCT 3060 

CACTGTTGAG GAAGGCCAGA GGTTGAAAGT GATCTATGGA TCCTGTGCTG GATTCCATGC 3120 

TGTTGATGTG GATTCAGGAT CAGTCTATGA CATTTATCTA CCAACACATG TAAGAAAGAA 3180 

CCCACACTCT ATGATCCAGT GTAGCATCAA ACCCCATGCA ATCATCATCC TCCCCAATAC 324 0 

AGATGGAATG GAGCTTCTGG TGTGCTATGA AGATGAGGGG GTTTATGTAA ACACATATGG 33 00 

AAGGATCACC AAGGATGTAG TTCTACAGTG GGGAGAGATG C CTACAT CAG TAGCATATAT 3360 

TCGATCCAAT CAGACAATGG GCTGGGGAGA GAAGGCCATA GAGATCCGAT CTGTGGAAAC 3420 

TGGTCACTTG GATGGTGTGT TCATGCACAA AAGGGCTCAA AGACTAAAAT TCTTGTGTGA 34 80 

ACGCAATGAC AAGGTGTTCT TTGCCTCTGT TCGGTCTGGT GGCAGCAGTC AGGTTTATTT 3 540 

CATGACCTTA GGCAGGACTT CTCTTCTGAG CTGGTAGAAG CAGTGTGA^. CAGGGATTAC 3 600 

TGGCCTCCAG AGTCTTCAAG ATCCTGAGAA CTTGGAATTC CTTGTAAC r GAGCTCGGAG 3 660 

CTGCACCGAG GGCAACCAGG ACAG CTGTGT GTGCAGACCT CATGTGTTCG GTTCTCTCCC 3720 

CTCCTTCCTG TTCCTCTTAT ATACCAGTTT ATCCCCATTC TTTTTTTTTT TCTTACTCCA 3780 

AAATAAATCA AGGCTGCAAT GCAGCTGGTG CTGTTCAGAT TCCAAAAAAA AAAAAAAACC 3 84 0 
ATGGTACCCG GATCCTCGAA TTCC 



ACF8 DNA sequence 

Gene name: Phospholipase A2, group IVC (cytosolic, calcium- independent ) 
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Unigene number: Hs. 18858 
Probeset Accession #: AA054087 
Nucleic Acid Accession #: NM_003706 

Coding sequence: 310-193 5 (predicted start/stop codons underlined) 

CACGAGGCAG GGGCCATTTT ACCTCCAGGT TGGCCCTGCT CAGGACCAGG AGGAAACACC 60 

TCCAGCCCGC GACCTCCTCC CACAGGGGGA AAAGGAAAGC AGGAGGACCA CAGAAGCTTT 120 

GGCACCGAGG ATCCCCGCAG TCTTCACCCG CGGAGATTCC GGCTGAAGGA GCTGTCCAGC 180 

GACTACACCG CTAAGCGCAG GGAGCCCAAG CCTCCGCACC GGATTCCGGA GCACAAGCTC 240 

CACCGCGCAT GCGCACACGC CCCAGACCCA GGCTCAGGAG GACTGAGAAT TTTCTGACCG 300 

CAGTGCACC A TGG GAAGCTC TGAAGTTTCC ATAATTCCTG GGCTCCAGAA AGAAGAAAAG 360 

GCGGCCGTGG AGAGACGAAG ACTTCATGTG CTGAAAGCT C TGAAGAAGCT AAGGATTGAG 420 

GCTGATGAGG CCCCAGTTGT TGCTGTGCTG GGCTCAGGCG GAGGACTGCG GGCTCACATT 480 

GCCTGCCTTG GGGTCCTGAG TGAGATGAAA GAACAGGGCC TGTTGGATGC CGTCACGTAC 540 

CTCGCAGGGG TCTCTGGATC CACTTGGGCA ATATCTTCTC TCTACACCAA TGATGGTGAC 600 

ATGGAAG CTC TCGAGGCTGA CCTGAAACAT CGATTTACCC GACAGGAGTG GGACTTGGCT 660 

AAGAGCCTAC AGAAAACCAT CCAAGCAGCG AGGTCTGAGA ATTACTCTCT GACCGACTTC 720 

TGGGCCTACA TGGTTATCTC TAAGCAAACC AGAGAACTGC CGGAGTCTCA TTTGTCCAAT 780 

ATGAAGAAGC CCGTGGAAGA AGGGACACTA CCCTACCCAA TATTTGCAGC CATTGACAAT 840 

GACCTGCAAC CTTCCTGGCA GGAGGCAAGA GCACCAGAGA CCTGGTTCGA GTTCACCCCT 900 

CACCACGCTG GCTTCTCTGC ACTGGGGGCC TTTGTTTCCA TAACCCACTT CGGAAGCAAA 960 

TTCAAGAAGG GAAGACTGGT CAGAACTCAC CCTGAGAGAG ACCTGACTTT CCTGAGAGGT 1020 

TTATGGGGAA GTGCTCTTGG TAACACTGAA GTCATTAGGG AATACATTTT TGACCAGTTA 1080 

AGGAATCTGA CCCTGAAAGG TTTATGGAGA AGGGCTGTTG CTAATGCTAA AAGCATTGGA 1140 

CACCTTATTT TTGCCCGATT ACTGAGGCTG CAAGAAAGTT CACAAGGGGA ACATCCTCCC 1200 

CCAGAAGATG AAGGCGGTGA GCCTGAACAC ACCTGGCTGA CTGAGATGCT CGAGAATTGG 1260 

ACCAGGACCT CCCTGGAAAA GCAGGAGCAG CCCCATGAGG ACCCCGAAAG GAAAGGCTCA 1320 

CTCAGTAACT TGATGGATTT TGTGAAGAAA ACAGGCATTT GCGCTTCAAA GTGGGAATGG 1380 

GGGACCACTC ACAACTTCCT GTACAAACAC GGTGGCATCC GGGACAAGAT AATGAGCAGC 1440 

CGGAAGCACC TCCACCTGGT GGATGCTGGT TTAGCCATCA ACACTCCCTT CCCACTCGTG 'l500 

CTGCCCCCGA CGCGGGAGGT TCACCTCATC CTCTCCTTCG ACTTCAGTGC CGGAGATCCT 1560 

TTCGAGACCA TCCGGGCTAC CACTGACTAC TGCCGCCGCC ACAAGATCCC CTTTCCCCAA 1620 

GTAGAAGAGG CTGAGCTGGA TTTGTGGTCC AAGGCCCCCG CCAGCTGCTA CATCCTGAAA 1680 

GGAGAAACTG GACCAGTGGT GATACATTTT CCCCTGTTCA ACATAGATGC CTGTGGAGGT 1740 

GATATTGAGG CATGGAGTGA CACATACGAC ACATTCAAGC TTGCTGACAC CTACACTCTA 1800 

GATGTGGTGG TGCTACTCTT GGCATTAGCC AAGAAGAATG TCAGGGAAAA CAAGAAGAAG 1860 

ATCCTTAGAG AGTTGATGAA CGTGGCCGGG CTCTACTACC CGAAGGATAG TGCCCGAAGT 1920 

TGCTGCTTGG CATAGATGAG CCTCAGCTTC CAGGGCACTG TGGGCCTGTT GGTCTACTAG 1980 

GGCCCTGAAG TCCACCTGGC CTTCCTGTTC TTCACTCCCT TCAGCCACAC GCTTCATGGC 204 0 

CTTGAGTTCA CCTTGGCTGT CCTAACAGGG CCAATCACCA GTGACCAGCT AGACTGTGAT 2100 

TTTGATAGCG TCATTCAGAA GAAGGTGTCC AAGGAGCTGA AGGTGGTGAA ATTTGTCCTG 2160 

CAGGTCCCTC GGGAGATCCT GGAGCTGGAG CATGAGTGTC TGACAATCAG AAGCATCATG 2220 

TCCAATGTCC AGATGGCCAG AATGAATGTG ATAGTTCAGA CCAATGCCTT CCACTGCTCC 228 0 

TTTATGACTG CACTTCTAGC CAGTAGC TCT GCACAAGTTA GCTCTGTAGA AGTAAGAACT 234 0 

TGGGCTTAAA TCATGGGCTA TCTCTCCACA GCCAAGTGGA GCTCTGAGAA TACAACAAGT 2400 

GCTCAATAAA TGCTTGCTGA TTGACTGATG AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 2460 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAA 



ACG1 DNA sequence 

Gene name: Carbohydrate (chondroitin 6/keratan) sulfotransf erase 1 
Unigene number: Hs. 1045 76 
Probeset Accession #: AA868063 
Nucleic Acid Accession #: NM_003 654 

Coding sequence: 3 67-1602 (predicted start/ stop codons underlined) 

GGGGAGGGCG CGGGAGGCGG AGGATGC CGC CGCGGCTGCT GCCGCCGCCG CCACCCGCGG 60 

GTCCCCGGCG ACCCTACTCC AGACCCGAGG ATGGAGCCGG CGCTGGGCGC TGCAGCTGCT 120 

CCCGGCGCGT CCCCGACCAG GTAGCTGGTG TCACTTCGGT GTGGTTGGAA GAAGACTTTC 180 

TCCCCAGCTG CATTCCCGGA GGCGCCCTTT CGACCTGGAG GCCGGGTCTG CTGGCCACAG 240 

GGCTGCCGCA CTGGCTGGGA CTGCCAGCTG GGC CTGGAGA CGCTGGTGGC TGTGGACTCC 3 00 

CCAGCTTGGA GCAGTCCCTC TTTGACCTCA CCCCTTGGAG AAGCAGCCCC ATGAAGGTGC 3 60 

CCAGCCATGC AATGTTCCTG GAAGGCCGTC CTCCTCCTTG CCCTGGCCTC CATTGCCATC 420 

CAGTACACGG CCATCCGCAC CTTCACCGCC AAGTCCTTTC ACACCTGCCC CGGGCTGGCA 480 

GAGGCCGGGC TGGCCGAGCG ACTGTGCGAG GAGAGC C CCA CCTTCGCCTA CAACCTCTCC 54 0 

CGCAAGACCC AC AT C CT CAT CCTGGCCACC ACGCGCAGCG GCTCCTCCTT CGTGGGC CAG 600 

CTCTTCAACC AGCAC CTGG A CGTCTTCTAC CTGTTTGAGC CCCTCTACCA CGTCCAGAAC 660 

ACGCTCATCC CCCGCTTCAC CCAGGGCAAG AGCCCGGCCG ACCGGCGGGT CATGCTAGGC 720 
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GCCAGCCGCG ACCTCCTGCG GAGCCTCTAC GACTGCGACC TCTACTTCCT GGAGAACTAC 780 

ATCAAGCCGC CGCCGGTCAA CCACACCACC GACAGGATCT TCCGCCGCGG GGCCAGCCGG 840 

GTCCTCTGCT CCCGGCCTGT GTGCGACCCT CCGGGGCCAG CCGACCTGGT CCTGGAGGAG 900 

GGGGACTGTG TGCGCAAGTG CGGGCTACTC AACCTGACCG TGGCGGCCGA GGCGTGCCGC 960 

GAGCGCAGCC ACGTGGCCAT CAAGACGGTG CGCGTGCCCG AGGTGAACGA CCTGCGCGCC 1020 

CTGGTGGAAG ACC CGCGATT AAACCTCAAG GTCATCCAGC TGGTCCGAGA CCCCCGCGGC 1080 

ATTCTGGCTT CGCGCAGCGA GACCTTCCGC GACACGTACC GGCTCTGGCG GCTCTGGTAC 1140 

GGCACCGGGA GGAAACCCTA CAACCTGGAC GTGACGCAGC TGACCACGGT GTGCGAGGAC 1200 

TTCTCCAACT CCGTGTCCAC CGGCCTCATG CGGCCCCCGT GGCTCAAGGG CAAGTACATG 1260 

TTGGTGCGCT ACGAGGACCT GGCTCGGAAC CCTATGAAGA AGACCGAGGA GATCTACGGG 13 20 

TTCCTGGGCA TCCCGCTGGA CAGCCACGTG GCCCGCTGGA TCCAGAACAA CACGCGGGGC 138 0 

GACCCCACCC TGGGCAAGCA CAAATACGGC ACCGTGCGAA ACTCGGCGGC CACGGC CGAG 1440 

AAGTGGCGCT TCCGCCTCTC CTACGACATC GTGGCCTTTG CCCAGAACGC CTGCCAGCAG 1500 

GTGCTGGCCC AGCTGGGCTA CAAGATCGCC GCCTCGGAGG AGGAGCTGAA GAACCCCTCG 1560 

GTCAGCCTGG TGGAGGAGCG GGACTTCCGC CCCTTCTCGT_GACCCGGGCG GTGCGGGTGG 1620 

GGGCGGGAGG CGCAAGGTGT CGGTTTTGAT AAAATGGACC GTTTTTAACT GTTGCCTTAT 1680 

TAACCCCTCC CTCTCCCACC TCATCTTCGT GTCCTTCCTG CCCCCAGCTC ACCCCACTCC 1740 

CTTCTGCCCC TTTTTTGTCT CTGAAATTTG CACTACGTCT TGGACGGGAA TCACTGGGGC 1800 

AGAGGGCGCC TGAAGTAGGG TCCCGCCCCC CCCACCCCAT TCAGACACAT GGATGTTGGG 1860 

TCTCTGTGCG GACGGTGACA ATGTTTACAA GCACCACATT TACACATCCA CACACGCACA 1920 

CGGGCACTCG CGAGGCGACT TCT CAAGCTT TTGAATGGGT GAGTGGT CGG GTAT CTAGTT 1980 

TTTGCACTGT CTTACTATTC AAGGTAAGAG GATACAAACA AGAGGACCAC TTGTCTCTAA 2040 

TTTATGAATG GTGTCCATCC TTTCCCCATC CCTGCCTCCT GCCCCTGACG CCCATTTCCC 2100 

CCCTTAGAGC AGCGAAACTG CCCCCTCCTG CCCGCCCTTG CCTGTCGGTG AGGCAGGTTT 2160 

TTACTGTGAG GTGAACGTGG ACCTGTTTCT GTTTCCAGTC TGTGGTGATG CTGTCTGTCT 2220 

GTCTGAGTCT CGTGGCCGCC CCTGGACCAG TGATGACTGA TGAATCTTAT GAGCTTCTGA 2280 

TTGATCTCGG GGTCCATCTG TGATATTTCT TTGTGCCAAA AAGAAAAAAA AAGAGTGGAT 2340 

CAGTTTGCTA AATGAACATT GAAATTGAAA TGCTTTATCT GTGTTTTCTG TAAATAAAAG 2400 
AGTGCAATAA TCACC ' 



ACG5 DNA sequence 
Gene name: Multimerin 
Unigene number: Hs. 2 6 8107 
Probeset Accession #: U27109 
Nucleic Acid Accession #: U27109.1 

Coding sequence: 72-3 758 (predicted start /stop codons underlined) 

CTGCTATCAA AAAGGCCATA AGGATTTTGT CCCCAAATTT CACATGAGCT ACCTTGCTTC 60 

AAACTACTGA GATGA AGGGG GCAAGATTAT TTGTCCTTCT TTCTAGT TTA TGGAGTGGGG 120 

GCATTGGGCT TAACAACAGT AAGCATTCTT GGACTATAC C TGAGGATGGG AACTCTCAGA 180 

AGACTATGCC TTCTGCTTCA GTTCCTCCAA ATAAAATACA AAGTTTGCAA ATACTGCCAA 24 0 

CCACTCGGGT CATGTCGGCG GAGATAGCTA CAACTCCAGA GGCAAGAACT TCTGAAGACA 300 

GTCTTCTTAA ATCAACACTG CCTCCCTCAG AAACAAGTGC ACCTGCTGAG GGTGTGAGAA 360 

ATCAAACTCT CACATCCACA GAGAAAGCAG AAGGAGTGGT CAAGTTACAG AATCTTACCC 420 

TCCCAACCAA CGCTAGCATC AAGTTCAATC CTGGAGCAGA ATCAGTGGTC CTTTCCAATT 4 80 

CTACACTGAA ATTTCTTCAG AGCTTTGCCA GAAAGTCAAA TGAACAAGCA ACTTCTCTAA 540 

ACACAGTTGG AGGCACTGGA GGCATTGGAG GCGTTGGAGG CACTGGAGGC GTGGGAAATC 600 

GAGCCCCACG GGAAACATAC CTCAGCCGGG GTGACAGCAG TTCCAGCCAA AGAACTGACT 660 

ACCAAAAATC AAATTTCGAA ACAACTAGAG GAAAGAATTG GTGTGCTTAT GTACATACCA 720 

GGTTATCTCC CACAGTGACA TTGGACAACC AGGTCACTTA TGTCCCAGGT GGGAAAGGAC 780 

CTTGTGGCTG GACCGGTGGA TCCTGTCCTC AGAGATCTCA GAAGATATCC AATCCTGTCT 840 

ATAGGATGCA ACATAAAATT GTCACCTCAT TGGATTGGAG GTGCTGTCCT GGATACAGTG 900 

GGCCGAAATG TCAACTAAGA GCCCAGGAAC AG C AAAGTTT GATACACACC AACCAGGCTG 960 

AAAGTCATAC AGCTGTTGGC AGAGGAGTAG CTGAGCAGCA GCAGCAGCAA GGCTGTGGTG 1020 

ACCCAGAAGT GATGCAAAAA ATGACTGATC AGGTGAACTA CCAGGCAATG AAACTGACTC 1080 

TTCTGCAGAA GAAGATTGAC AATATTTCTT TGACTGTGAA TGATGTAAGG AACACTTACT 1140 

CCTCCCTAGA AGGAAAAGTC AGCGAAGATA AAAGCAGAGA ATTTCAATCT CTTCTAAAAG 1200 

GTCTAAAATC CAA^GCATT AATGTACTGA TAAGAGACAT AGTAAGAGAA CAATTTAAAA 1260 

TTTTTCAAAA TGAt~TGCAA GAGACTGTAG CACAGCTCTT CAAGACTGTA TCAAGTCTAT 13 2 0 

CAGAGGACCT CGAAAGCACC AGGCAAATAA TTCAAAAAGT TAATGAATCT GTGGTTTCAA 138 0 

TAGCAGCCCA GCAAAAGTTT GTTTTGGTGC AAGAGAATCG GCCCACTTTG ACTGATATAG 144 0 

TGGAACTAAG GAATCACATT GTGAATGTAA GGCAAGAAAT GACTCTTACA TGTGAGAAGC 1500 

CTATTAAAGA ACTAGAAGTA AAGCAGACTC ATTTAGAAGG TGCTCTAGAA CAGGAACACT 1560 

CAAGAAGCAT TCTGTATTAT GAATCCCTCA AT AAAACT CT TTCTAAATTG AAGGAAGTAC 1620 

ATGAGCAGCT TTTATCAACT GAACAGGTAT C AG AC CAG AA GAATGCTCCA GCTGCTGAGT 1680 

CAGTTAGCAA TAATGT CACT GAGTACATGT CTACTTTACA TGAAAATATA AAGAAGCAGA 1740 

GTTTGATGAT GCTGCAAATG TTTGAAGATT TGCACATTCA AGAAAGCAAG ATTAACAATC 1800 
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TCACCGTCTC TTTGGAGATG GAGAAAGAGT 
CCAAATGCAG AAATGATTTT AAATTTCAAC 
TAAATCAAAC ATTGGCTGAA GTTCTCTTTC 
AGCAACTAAA TGATTTGACT TATGATATGG 
5 CAT C ACT CAG ACAGACAATG ACATATGAAC 
AGATAGAAAA TCTGACTAGT GCTGTCAATA 
AAAGACACAA CTTACTTAGA AATGAAGTAC 
TCAATGAATA TGCCTTAGAA ATGGAAGATG 
ATGCTATTGA TTTCATTCAA GATAACTATG 
10 ATAATAGTGA GATCCATCAT AAATGTACCT 
CTCAGTTCCA CCGTCTGAAT GATTCTATTC 
ACTTTGTTTT GCAAGTCGCC AAGACCCTTG 
AGTCCAACTT CCAAAAGATG TATCAAATGT 
ACCAGCAAAA TATGAGTCAT TTGGAAGAAA 
15 ATTTTGAGAC TCGGTTGCAA GACATTGAGT 
ATATTTCAGT TAAAAAAGGC AGTGTAGTTA 
AAGTATTAAA TTCCAGATTT AAGG CGTTGG 
TCTTTTCGCT TAACAAAACT CTCCACGAAG 
GTGTGTCAGA ACTGAATGCT ACCATCCCTA 
20 AACTTCTTCA GAAAGGT CTA ACAGAATTTG 
M« CTGCCCTATC TAATTCAACT TGTTGTATAG 

Q TTGTCAAGTC TCAGAAGCAA GTAAAATCAT 

S CAACGGTAAA TCTTACCACA GTCCTGATAG 

~!Ts TATATCCTGA GGAGTATTCA AGCTGTAGTC 

yj>S TAAATGGAAG AACTAGCTTT ACCTGTGCCT 
H s CTAT CAAGCT TGTGGAAGAA AATGCTTTAG 

CP ATGCACCCAT GGTGGCATTT TTTGCATCTC 

ffi TCCTGTTTAA TAACTTGGAT GTCAATTATG 

J-J TTAGAATTCC GTATCTTGGA GTATATGTTT 

^30 ATATTTCTGG ATTTTTAGTG GTTGATGGAA 
s TTAACAGTGA AATACACTGT GATAGGGTTT 

N : ATGGGCAGGA AGTCTGGTTA CGACTTGCAA 

f!I TTACTACATT TAGTGGCTAT TTATTATATC 

CACCTTTATT GAGAAACAGC CAGTGTTTTC 
S$ 5 TTGGTTTTTC TACAGGAAAT GAAAATCAAC 
TCTATTTTAT AAAATTATTT GAATATTGTT 
Q CTAAAGAAAT TTAGTGGCAC AGAAAACAAA 

M= TTATTTCTTC ATTTTAAGTC ATTGCAATGG 

ATATTATCAG TCACAGTTTT CTTTCCAATT 
40 TAAATATATA ACACACATTT TCTAGATTCA 



CTCTCAGAGG TGAATGTGAA GACATGTTAT 1860 

TTAAGGACAC AGAAGAGAAT TTACATGTGT 1920 

CAATGGACAA TAAGATGGAC AAAATGAGTG 1980 

AGATCCTTCA ACCCTTGCTT GAGCAGGGAG 2040 

AACCAAAGGA AGCAATAGTG ATAAGGAAAA 2100 

GTCTAAATTT TATTATCAAA GAACTTACAA 2160 

AGGGTCGTGA TGATGCCTTA GAAAGACGTA 2220 

GCCTCAATAA GACAATGACT ATTATAAATA 2280 

CCCTAAAAGA GACTTTAAGT ACTATTAAGG 2340 

CCGATATGGA AACTATTTTG ACATTTATTC 2400 

AGACTTTGGT CAATGACAAT CAGAGATATA 2460 

CAGGTATTCC CAGAGATGAG AAACTAAATC 2520 

TCAATGAAAC CACTTCCCAA GTGAGAAAAT 2580 

AACTACTCTT AACTACCAAG ATTTCCAAAA 2640 

CTAAAGTTAC CCAGACGCTC ATACCTTATT 2700 

CAAATGAGAG AGATCAGGCT CTTCAACTGC 2760 

AAGCAAAATC TATC CATCTT TCAATTAACT 2820 

TTTTAACAAT GTGTCACAAT GCTTCTACAA 2880 

AGTGGATAAA ACATTCCCTG CCAGATATTC 2940 

TGGAACCAAT AATTCAAATA AAAACTCAAG 3000 

ATCGATCGTT GCCTGGTAGT CTGGCAAATG 3060 

TGCCAAAGAA AATTAACGCA CTTAAGAAAC 3120 

GCCGGACTCA AAGAAACACG GACAACATAA 3180 

GGCATCCGTG CCAAAATGGG GGCACGTGCA 3240 

GCAGACATCC TTTTACTGGT GACAACTGCA 3300 

CTCCAGATTT TTCCAAAGGA T CTT AC AG AT 3360 

ATACGTATGG AATGACTATA CCTGGTCCTA 3420 

GAGCTTCATA TACCCCAAGA ACTGGAAAAT 34 80 
TCAAGTACAC CATCGAGTCA TTTAGTGCTC ' 3540 

TAGACAAGCT TGCATTTGAG TCTGAAAATA '3600 

TAACTGGGGA TGCCTTATTA GAATTAAATT 3660 

AAGGAACAAT TCCAGCCAAG TTTCCCCCTG 3720 

GTAC ATAA GT TAGTATGAAA AACAGACTAT 3780 

ATTTATCTTT GCTTGCACAT CTGCTCTGTT 3840 

TTGTTTTTTT AATATGAGTA AACTTGTATG 3900 

TAATGTCTGA ATATGAAAGA GTTCTTGATC 3960 

GTGAATTTGT TAGCATAATT ATTCCTATTC 4020 

AAAGTAATAT TATAAAACGG TAATTACAAC 4080 
AAACACTTAA CTTTTGTTAT TCCCTGTATA 414 0 
CAAATTTAAA TAAATTACTC AAAAAATG 



ACC6 DNA sequence 

Gene name: Homo sapiens cDNA FLJ11502 fis, clone HEMBA1002102 , weakly similar to 
45 ANKRYIN 

Unigene number: Hs. 213194 
Probeset Accession #: AA187101 
Nucleic Acid Accession AK021564 

Coding sequence: 1-450 (predicted stop codon underlined, 5' end sequence is open) 

50 

GTCGCCGCGC GGCCGCCGGT GAGCCGCATG GAGCCCCGGG CGGCGGACGG CTGCTTCCTG 60 

GGCGACGTGG GTTTCTGGGT GGAGCGGACC CCTGTGCACG AGGCAGCCCA GCGGGGTGAG 120 

AGCCTGCAGC TGCAACAGCT GATCGAGAGC GGCGCCTGCG TGAACCAGGT CACCGTGGAC 180 

TCCATCACGC CCCTGCACGC AG CCAGT CTG CAGGGCCAGG CGCGGTGTGT GCAGCTGCTG 240 

55 CTGGCGGCTG GGGCCCAGGT GGATGCTCGC AACAT CGACG GCAGCACCCC GCTCTGCGAT 300 

GCCTGCGCCT CGGGCAGCAT CGAGTGTGTG AAGCTCTTGC TGTCCTACGG GGCCAAGGTC 360 

AACCCTCCCC TGTACACAGC GTCCCCCCTG CACGAGGCCA GCTTTCCCCG CCTCCTGAGC 420 

ACCCTGGCTT CGACGCCCTG GAT CAACTGA GCCAGGTGGA ACT CCTGGGG GACAT GGATC 4 80 

GCAATGAATT CGACCAGTAT TTGAACACTC CTGGP~ACCC AGACTCCGCC ACAGGGGCCA 540 

60 TGGCCCTCAG TGGGCATGTT CCGGTCTCCC AGGTv^CACC AACGGGTCCC ACAGAGACCA 600 

GCCTCATCTC CGTCCTGGCT GATGCCACGG CCACGTACTA CAACAGCTAC AGTGTGTCAT 660 

AGAGCTGGAG GCGCCCCGTC CGGTCAGCCC TCGCGCCCTC TCCTTCTTGT GCCTTGAGTG 720 

GCAGAGGAGC CGTCCAGCCA CACCAGCTTT CCTCCCACCG CTCAGGGCAG GGAGGTCTGA 780 

ACTGCGGCCC CAGAGCCTTT GGCCTAAGCT GGACTCTCCT TATCCGAGTG CCGCCTCTAT 840 

65 CCCCTTCCCC ACGTTCCAGC CCCTGCAGCC CACATTTTAA GTATATTCCT TCAAGTGAGT 900 

TTTCCTCCAG CCCCTGAGAG TTGCTGTCTC CCAGTGGAAT GTTCACTGAC GTCTTTTCTT 960 

GGTAGCCATC ATCGAAACTA ATGGGGGGAC AGACTTGATA GCCAAGGTCC CTTCTGGTCC 1020 

AGTTTTCTGA TTTAGGGTTC TCTCAAGATT AATAAAGGAA GATGGGGAAA TTTGACT CAT 108 0 
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TAATGAGCTC GCTAACCTAC GATCTGGTGA TAATTTTGTG TGCACAGCCC AAGGACCACG 1140 

AGGCTTTCTG CACTTTCTGC ACCCCCTTCC AAAGTGACCA CAAAATTTCA AAGGGACTCA 1200 

TACAATTTGA GAAAAAACAG TCAACCTGAT TTGAGAAATT AACCAGTATG GCTAACTATA 1260 

TCACAGAAAA TGGGATTGAG TTAAAACTAT TTTATTTTAA ATATACATTT TAAAGCAGTT 1320 

CTTTTTTTTT TGTTAATTTG TTTATTATAC ACACACTTCA AGAGAATATG CACAGTCTAG 1380 

GCCGGGCACG GTGGCTCACG CCTGTAATCC CAGCACTTTG GGAGGCCGAG GCATGTGGAT 1440 

CACCTGAGGT CAGGAGTTTG AGACCAGCCT AGACAACATG GTGAAAC CTT GTCTCTATGA 1500 

AAAATACAAA ATTTGCTGGG AGTGGTGGTG CATGCCTGTA ATCCCAGCTA CTTGGAAGGC 1560 

TGAGGCAGGA GAATGTCTTG AACCTAGGAG GTGGAGGTTG CAGTGAGCTG AGATTGCACC 1620 
ATTGCACTCC AGCCTGTGCA ACAAGAGTGA AACTCCATTT CAAG 



ACC7 DNA sequence 
Gene name: Human RAL A gene 
Unigene number: Hs.6906 
Probeset Accession #: AA083 572 

Nucleic Acid Accession #: contig of X15014.1 and AK026850 
Coding sequence: 1-621 (predicted start/stop codons underlined) 

ATGGCTGCAA ATAAGCCCAA GGGTCAGAAT TCTTTGGCTT TACACAAAGT CATCATGGTG 60 

GGCAGTGGTG GCGTGGGCAA GTCAGCTCTG ACTCTACAGT TCATGTACGA TGAGTTTGTG 120 

GAGGACTATG AGCCTACCAA AGCAGACAGC TATCGGAAGA AGGTAGTGCT AGATGGGGAG 180 

GAAGTCCAGA TCGATATCTT AGATACAGCT GGGCAGGAGG ACTACGCTGC AATTAGAGAC 240 

AACTACTTCC GAAGTGGGGA GGGGTTCCTC TGTGTTTTCT CTATTACAGA AATGGAATCC 3 00 

TTTGCAGCTA CAGCTGACTT CAGGGAGCAG ATTTTAAGAG TAAAAGAAGA TGAGAATGTT 360 

CCATTTCTAC TGGTTGGTAA CAAATCAGAT TTAGAAGATA AAAGACAGGT TTCTGTAGAA 420 

GAGGCAAAAA ACAGAGCTGA GCAGTGGAAT GTTAACTACG TGGAAACATC TGCTAAAACA 480 

CGAGCTAATG TTGACAAGGT ATTTTTTGAT TTAATGAGAG AAAT TCGAGC GAGAAAGATG 54 0 

GAAGACAGCA AAGAAAAGAA TGGAAAAAAG AAGAGGAAAA GTTTAGCCAA GAGAATCAGA 600 

GAAAGATGCT GCATTTTATA_ATCAAAGCCC AAACTCCTTT CTTATCTTGA CCATACTAAT '660 

AAATATAATT TATAAGCATT GCCATTGAAG GCTTAATTGA CTGAAATTAC TTTAACATTT 720 

TGGAAATTGT TGTATATCAC TAAAAGCATG AATTGGAACT GCAATGAAAG TCAAATTTAC 780 

TTTAAAAAGA AATTAATATG GCTTCACCAA GAAGCAAAGT TCAACTTATT TCATAATTGC 84 0 

CTACATTTAT CATGGTCCTG AATGTAGCGT GTAAGCTTGT GT TTCTTGGG CAGTCTTTCT 900 

TGAAATTGAA GAGGTGAAAT GGGGGTGGGG AGTGGGAGGA AAGGTGACTT CCTCTGGTGT 960 

TTATTATAAA GCTTAAATTT TATATCATTT TAAAATGTCT TGGTCTTCTA CTGCCTTGAA 1020 

AAATGACAAT TGTGAACATG ATAGTTAAAC TACCACTTTT TTTAACCATT ATTATGCAAA 1080 

ATTTAGAAGA AAAGTTATTG GCATGGTTGT TGCATATAGT TAAACTGAGA GTAATTCATC 114 0 

TGTGAATCTG CTTTAATTAC CTGGTGAGTA ACTTAGAAAA GTGGTGTAAA CTTGTACATG 1200 

GAATTTTTTG AATATGCCTT AATTTAGAAA CTGAAAAATA TCCGGTTATA TCATTCTGGG 1260 

TGTGTTCTTA CTGACACCAG GGGTCCGCTG CCCCATGTGT CCTGGTGAGA AAATATATGC 1320 

CTGGCACAGC TTTTGTATAG AAAATTCTTG AGAAGTAACT GTCCGCTAGA AGTCTGTCCA 1380 

AATTTAAAAT GTGTGCCATA TTCTGGTTCT TGAAAATAAG ATTCCAGAGC TCTTTGATCG 1440 

CTTTTAATAA ACTGCAAGTT CATTTTAATT GAAGGGCCAG CATATATACT TGCAAGATAA 1500 

TTTTCAGCTG QAAGGATTCA GCACCAGTTA TGTTTGAATG AACCCTCCTT TTCTCTGAGA 1560 

TTCTGGTCCC TGGAAATCCC TTTCTGCTAG TGGTGAGCAT GTAAGTGTTA AGTTTTTAAT 1620 

CTGGGAGCAG GGCATAGGAA GAAAATGTCA GTAGTGCTAA TGCATTTTGC ACTAGAACGC 1680 

TTCGGGAAAA TATTCATGCT TGCCATCTGT TCATTTCTAA ATTTATATTC ATAAAGTTAC 174 0 

AGTTTGATAC AGGAATTATT AGGAGTAATT CTTTTCTGTT TCTGTTTATA ATGAAGAACA 1800 

CTGTAGCTAC ATTTTCAGAA GTTAACATCA AGCCATCAAA CCTGGGTATA GTGCAGAAGA 1860 

CGTGGCACAC ACTGACCACA CATTAGGCTG TGTCACCATT GTGTGGTGTA CCTGCTGGAA 1920 

GAATT CTAGC ATGCTACTTG GGGACATAAT TTCAGTGGGA AATATGCCAC TGACCGATTT 1980 

TTTTTTTTTT CCTCTTTGCA GTGGGGCTAG GACAGTTGAT TCAACAAAGT ATTTTTTTCT 204 0 

TTTTTCTCAG TCCTAATTTG GACAGGT CAA AG ATGTGTT C AGGCATTCCA GGTAACAGGT 2100 

GTGTATGTAA AGTTAAAAAT AGGCTTTTTA GGAACTCACT CTTTAGATAT TTACATCCAG 2160 

CTTCTCATGT TAAATATTTG TCCTTAAAGG GTTTGAGATG TACATCTTTC ATTTCGTATT 2220 

TCTCATAGGC TATGCCATGT GCGGAATT C A AGTTACCAAT GTAACACTGG CCAGCGGGCC 22 80 

CAGCAATCTC CATGTGTACT TATTACAGTC TTATTTAACC AGGGGTCCTA ACCACTAACA 2340 

TTGTGACTTT GCTTTGAGAC CTTTCCTCTC CTGGGTACTG AGGTGCTATG AAGCGA ? CTG 24 00 

ACAAAGATGC ATCACGTGTC TTAGGCTGAT GCCACTACCC GATT TGTTTA TTTGCJVTTT 2460 

GAGCCATTTA AAGACCAATA AACTTCCTTT TTTAAAAAAA AAAAAAAAAA AAAAAAAAAA 2520 
A 



ACC9 DNA sequence 
Gene name: KIAA0955 protein 
Unigene number: Hs. 10031 
Probeset Accession #: AA027168 
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Nucleic Acid Accession #: AB023172 

Coding sequence: 314-1609 (predicted start/stop codons underlined) 

CTGGTTCTCA ACTTCTTTTG AAATAATGTT CATAGAGAAG GAGGGCTGTC TGAGATTCGA 60 

GGGAAACAAG CTCTCAGGAC TTCCGGTCGC CATGATGGCT GTGGGCGGTA AACGCGGTTA 120 

GTGCAAGCAT CTGGGCCATC TTCAATGGTA AAAAAGATAC AGTAAAGACA TAAATACCAC 180 

ATTTGACAAA TGGAAAAAAA GGAGTGTCCA GAAAAGAGTA GCAG CAGTG A GGAAGAGCTG 24 0 

CCGAGACGGG TATACAGGGA GCTACCCTGT GTTTCTGAGA CCCTTTGTGA CATCTCACAT 300 

TTTTTCCAAG AAGATGATGA GACAGAGGCA GAGCCATTAT TGTTCCGTGC TGTTCCTGAG 360 

TGTCAACTAT CTGGGGGGGA CATTCCCAGG AGACATTTGC TCAGAAGAGA ATCAAATAGT 420 

TTCCTCTTAT GCTTCTAAAG TCTGTTTTGA GATCGAAGAA GATTATAAAA ATCGTCAGTT 4 80 

TCTGGGGCCT GAAGGAAATG TGGATGTTGA GTTGATTGAT AAGAGCACAA ACAGATACAG 540 

CGTTTGGTTC CCCACTGCTG GCTGGTATCT GTGGTCAGCC ACAGGCCTCG GCTTCCTGGT 600 

AAGGGATGAG GTCACAGTGA CGATTGCGTT TGGTTCCTGG AGT CAGCACC TGGCCCTGGA 660 

CCTGCAGCAC CATGAACAGT GGCTGGTGGG CGGCCCCTTG TTTGATGTCA CTGCAGAGCC 720 

AGAGGAGGCT GTCGCCGAAA TCCACCTCCC CCACTTCATC TCCCTCCAAG GTGAGGTGGA 780 

CGTCTCCTGG TTTCTCGTTG CCCATTTTAA GAATGAAGGG ATGGTCCTGG AGCATCCAGC 840 

CCGGGTGGAG CCTTTCTATG CTGTCCTGGA AAGCCCCAGC TTCTCTCTGA TGGGCATCCT 900 

GCTGCGGATC GCCAGTGGGA CTCGCCTCTC CATCCCCATC ACTTCCAACA CATTGATCTA 960 

TTATCACCCC CACCCCGAAG ATATTAAGTT CCACTTGTAC CTTGTCCCCA GCGACGCCTT 1020 

GCTAACAAAG GCGATAGATG ATGAGGAAGA TCGCTTCCAT GGTGTGCGCC TGCAGACTTC 1080 

GCCCCCAATG GAACCCCTGA ACTTTGGTTC CAGTTATATT GTGTCTAATT CTGCTAACCT 1140 

GAAAGTAATG CCCAAGGAGT TGAAATTGT C CTACAGGAGC CCTGGAGAAA TTCAGCACTT 1200 

CTCAAAATTC TATGCTGGGC AGATGAAGGA ACCCATTCAA CTTGAGATTA CTGAAAAAAG 1260 

ACATGGGACT TTGGTGTGGG ATACTGAGGT GAAGCCAGTG GATCTCCAGC TTGTAGCTGC 13 20 

ATCAGCCCCT CCTCCTTTCT CAGGTGCAGC CTTTGTGAAG GAGAACCACC GGCAACTCCA 13 80 

AGCCAGGATG GGGGAC CTGA AAGGGGTGCT CGATGATCTC CAGGACAATG AGGTTCTTAC 1440 

TGAGAATGAG AAGGAGCTGG TGGAGCAGGA AAAGACACGG CAGAGCAAGA ATGAGGCCTT 1500 

GCTGAGCATG GTGGAGAAGA AAGGGGACCT GGCCCTGGAC GTGCTCTTCA GAAGCATTAG 1560 

TGAAAGGGAC CCTTACCTCG TGTCCTATCT TAGACAGCAG AATTTGTAAA ATGAGTCAGT 1620 

TAGGTAGTCT GGAAGAGAGA ATCCAGCGTT CTCATTGGAA ATGGATAAAC AGAAATGTGA 1680 

TCATTGATTT CAGTGTTCAA GACAGAAGAA GACTGGGTAA CATCTATCAC ACAGGCTTTC 1740 

AGGACAGACT TGTAAC CTGG CATGTACCTA TTGACT GTAT CCTCATGCAT TTTCCTCAAG 1800 

AATGTCTGAA GAAGGTAGTA ATATTCCTTT TAAATTTTTT CCAACCATTG CTTGATATAT 1860 

CACTATTTTA TCCATTGACA TGATTCTTGA AGACCCAGGA TAAAGGACAT CCGGATAGGT 1920 

GTGTTTATGA AGGATGGGGC CTGGAAAGGC AACTTTTCCT GATTAATGTG AAAAATAATT 1980 

CCTATGGACA CTCCGTTTGA AGTATCACCT TCTCATAACT AAAAGCAGAA AAGCTAACAA 204 0 

AAGCTTCTCA GCTGAGGACA CTCAAGGCAT ACATGATGAC AGTCTTTTTT TTTTTTGTAT 2100 

GTTAGGACTT TAACACTTTA TCTATGGCTA CTGTTATTAG AACAATGTAA ATGTATTTGC 2160 

TGAAAGAGAG CACAAAAATG GGAGAAAATG CAAACATGAG CAGAAAATAT TTTCCCACTG 2220 

GTGTGTAGCC TGCTACAAGG AGTTGTTGGG TTAAATGTTC ATGGTCAACT CCAAGGAATA 2280 

CTGAGATGAA ATGTGGTAAA TCAACTCCAC AGAACCACCA AAAAGAAAAT GAGGGTAATT 2340 

CAGCTTATTC TGAGACAGAC ATTCCTGGCA ATGTACCATA CAAAAAATAA GCCAACTCTG 2400 

ACATTTGGAT TCTACCATAG ACT CTGTCAT TTTGTAGCCA TTTCAGCTGT CTTTTGATTA 24 60 

ATGTTTTCGT GGCACACATA TTTCCATCCT TTTATGTTTA ATCTGTTTAA AACAAGTTCC 2520 

TAGTAGACAC CATCTGGTTG AGTCAGTTTT TTTTATGGTG TATTTTGAAC CCATTCTGAT 2580 

AGTCTCTTTT AACTGGAAGA TTTCAATTAC TTACGTTAAT GTAATTATTA ATATGTTAGG 264 0 

ATTTATCCTC AGTCAGCCAG TTTGTTATGT CTTTTCTATT CTACTGTTAT CACATTTGTA 2700 

C CACTT AAAG TGGAATCTAG GCACTTTATC ACCATTTAGA TCCTATTACC TTTTCTCATC 2760 

TAGGATATAG TTATCTTCTA CATAATCTTT CTGTATCTTA AAACCCATCA ATAAATTATT 2820 

ATATATTTTC TACTTTTAAT CACTCAGAAG ATTTAAAAAA CTCATGAGAA GAGTAATCTG 2880 

TTATGTTTTT CCAGAT ATT T ACCATTTCTG TTGCTCTTCC TTCATTATTT TCCAAATTTC 294 0 

GTTCTGCAAA TTTCCACTTC TTCTGATAGA CGTTTTTTAG TTCTTTTAGA GTGGTTCTGA 3000 

TAGGTACAGA TTCTCTTATT TTTTGCTTCC TCTGAGGACA TCTTTTTCTC ACCTTCATTC 3060 

TCAGTGATGT TTTTTGCTTG TAGTATTTTT AGTTGACATT GTTTTCTGTT CAGCAGTTTC 3120 

CTTTTAGCTT CCGTATTTCC TGATGAGAAA TCTGCAGTCA TTCAAATTGT TGTTTCCCTG 3180 

TATGTAGTGT GTCATTTTTC TGTCAGATTT CAAGGTATTT ATCTTTAGTT TTTAGCCATT 3240 

T CAT T ATGTT GGGGATGAGT TTCCTTGTTT TATTCCCTTT GGAATTTGCT CCAATTCATA 33 00 

AS TTTGCAGT TTTATGTCTT TTACCAAACT TAGAGGTTTT CAGCCTAATT T CT AAAAAT A 33 60 

Cl 'TTTATTA GCCTGATTTT CATCTTTATA GGAAATAGTT TAAGTGATGA CAAGTTCCAA 3420 

TAGCTTATAT GCCCAGAAGG CCTTCAAAAT AAGAATTTTG AAAGAATACA GAAAACAAAC 348 0 

TTTTATATCC TTCTCATGTC TTCTACTGTA AAATT C AT AT GCTTTGCTAC TCTAAACCTA 354 0 

GTTTGAAATC AACAGTCTTG AGAATAGATG AAAATTTTGA TGAATAGTGG AATTCTTTTA 3600 

AATGGAAACC TCTTACATGT GATTTTCCTT GCCATCTAGA AAT AAAC CAT AGTATTTATG 3660 

TTGAATCAAT CAATATTATA TTTTGTTTTT TTCCTCCTCT TCTGAGACTC TTATTGTGGA 3720 

AATGTTAGAC TTTTATGTTT TCCTAAATGT CCCTGATATT CTACTTATTT AGAACATCTT 3780 

TTCATTTTTT CCATTATTCT GATTGGGTAA TTTTAATTTG TCTATTTTCA AATTTGCTGG 384 0 

AGTGTTCACC TGTTGTTGTC TGTGTCGTCC CACTGAGTGC ATTCACCACC TTTTAAATTT 3 900 
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TGGTCACTGT ATGTATCAGT TCTAAAATTT CCATTTTGTT CTCTATATTT TAAATTTCTT 3960 

GGCTTATATT CTATTTTCCT GCAAATGTGT CAGCATTTGC TTGTTTGAGC TTTTTTTTTT 4020 

TCAAGACAGG GTCTCAACTC TGTTACCCAG GCTGGAGTGC AGTGGTGCGA TCTCAGCTCA 408 0 

CTGCAACCTC TGCCTCCTGG TTCAAGCGAT TATTGTGCCT CAGCCTCCTG AGTAGCTGGG 4140 

ATTACAGGCA TGCACCACCA CAGCCCAGCT AATTTTTTGT ATTTTTAGTA GAGACAGAGT 4200 

TTTGCTATGT TGGCCAGGCT GGTTTTGAAC TCCTGGCCTC AAGTGATCCA CCCACCTCAG 4260 

CCTCCCAAAG TGCTGGGATT ACAGGCCACT ACACCTGGCA C ATT TGAGT A TTTTTTTTTT 4320 

TTTTTTTTTT TTGAGATGGA GTCTCGCTCT GTCATCTAGG CTGGAGTGCA GTGGTGTGAT 4380 

CTCAGCTCAC TGCAGCCTCT GTCTCCCGGG CTCAAGCGAT TCTCTTGCCT CAGCCTCCTG 444 0 

AGTAGCTAGG ACTACAGGTG CATGCCAACA CGCCCGGCTA ATTTTTTTAA AAAATATTTT 4500 

TAGTAGAGAC AGGGTTTCAC CATTTTGGCC AGGATGGTCT CGATCTCCTG ACCTCATGAT 4560 

CCACCCGCCT CGGCCTTCCA AAGTGC TGGG ATTACAGGCA TGAGCCACCG TGCCTGGCCT 4620 

CATTTGAGTA TTTTTATAAT GTCTCTTTTA AAGTCTTTGT CAGATAATTC CACTGTACAT 4680 

GTTAT TCAGT GTTTGGTGTC CACTGAGTTG TCATTTGCCA GACAAGTGGA GATTTTTGCA 474 0 

GCT CATCCTT GTATTCTCAG TAGTT CCGAT ATGTACCCTC GACATGTGAA TGTTATCTTA 4 800 

TGAGACTCTG TTTTATTTGT ATCCAACAGA AGATGTT TAT TATTTATTTG GCTTTCTGTG 4860 

AACTGAGGTC TTAATATCAG CTCATTTTAA AAGTCTTTGC AGTGGTATTC GGATCTATCC 4920 

TGTGTGTGCC TATGAGATTG GGTGCAGTGT ATCCTGTTAG CTCCATTCTC AGGGCGTTTG 4 980 

AATGTGAATT AGGACCAGCG CAATGAATGC TCAAGTTGGG GTTGGGCGTT AGAATTCATA 5040 
AAAGTCTTTA TATGCTCAG 



ACF6 DNA sequence 

Gene name: Homo sapiens cDNA FLJ10669 fis, clone NT2RP2006275 , weakly s 
Microtubule-associated protein IB [CONTAINS : LIGHT CHAIN LCI] 
Unigene number: Hs. 66048 
Probeset Accession #: AA609717 
Nucleic Acid Accession #: AK001531 

Coding sequence: 176-2194 (predicted start/stop codons underlined), 

CATCTCCCCC AACCTGGGGG TCGTGTTCTT CAACGCCTGC GAGGCCGCGT CGCGGCTGGC 60 

GCGCGGCGAG GATGAGGCGG AGCTGGCGCT CAGCCTCCTG GCGCAGCTGG GCATCACGCC 120 

TCTGCCACTC AGCCGCGGCC CCGTGCCAGC CAAACCCACC GTGCTCTTCG AGAAGATGGG 180 

CGTGGGCCGG CTGGACATGT ATGTGCTGCA CCCGCCCTCC GCCGGCGCCG AGCGCACGCT 240 

GGCCTCTGTG TGCGCCCTGC TGGTGTGGCA CCCCGCCGGC CCCGGCGAGA AGGTGGTGCG 3 00 

CGTGCTGTTC CCCGGTTGCA CCCCGCCCGC CTGCCTCCTG GACGGCCTGG TCCGCCTGCA 360 

GCACTTGAGG TTCCTGCGAG AGCCCGTGGT GACGCCCCAG GACCTGGAGG GGCCGGGGCG 420 

AGCCGAGAGC AAAGAGAGCG TGGGCTCCCG GGACAGCTCG AAGAGAGAGG GCCTCCTGGC 480 

CACCCACCCT AGACCTGGCC AGGAGCGCCC TGGGGTGGCC CGCAAGGAGC CAGCACGGGC 540 

TGAGGCCCCA CGCAAGACTG AGAAAGAAGC CAAGACCCCC CGGGAGTTGA AGAAAGACCC 600 

CAAACCGAGT GTCTCCCGGA CCCAGCCGCG GGAGGTGCGC CGGGCAGCCT CTTCTGTGCC 660 

CAACCTCAAG AAGACGAATG CCCAGGCGGC ACCCAAGCCC CGCAAAGCGC CCAGCACGTC 720 

CCACTCTGGC TTCCCGCCGG TGGCAAATGG ACCCCGCAGC CCGCCCAGCC TCCGATGTGG 780 

AGAAGCCAGC CCCCCCAGTG CAGCCTGCGG CTCTCCGGCC TCCCAGCTGG TGGCCACGCC 840 

CAGCCTGGAG CTGGGGCCGA TCCCAGCCGG GGAGGAGAAG GCACTGGAGC TGCCTTTGGC 900 

CGCCAGCTCA ATCC CAAGGC CACGCACACC CTCCCCTGAG TCCCACCGGA GCCCCGCAGA 960 

GGGCAGCGAG CGGCTGTCGC TGAGCCCACT GCGGGGCGGG GAGGCCGGGC CAGACGCCTC 1020 

ACCCACAGTG ACCACACCCA CGGTGACCAC GCCCTCACTA CCCGCAGAGG TGGGCTCCCC 1080 

GCACTCGACC GAGGTGGACG AGTCCCTGTC GGTGTCCTTT GAGCAGGTGC TGCCGCCATC 1140 

CGCCCCCACC AGTGAGGCTG GGCTGAGCCT CCCGCTGCGT GGCCCCCGGG CGCGGCGCTC 1200 

GGCTTCCCCA CACGATGTGG ACCTGTGCCT GGTGTCACCC TGTGAATTTG AGCATCGCAA 1260 

GGCGGTGCCA ATGGCACCGG CACCTGCGTC CCCCGGCAGC TCGAATGACA GCAGTGCCCG 132 0 

GTCACAGGAA CGGGCAGGTG GGCTGGGGGC CGAGGAGACG CCACCCACAT CGGT CAGCG A 1380 

GTCCCTGCCC ACCCTGTCTG ACTCGGATCC CGTGCCCCTG GCCCCCGGTG CGGCAGACTC 1440 

AGACGAAGAC ACAGAGGGCT TTGGAGTCCC TCGCCACGAC CCTTTGCCTG ACCCCCTCAA 1500 

GGTCCCCCCA CCACTGCCTG ACCCATCCAG CATCTGCATG GTGGACCCCG AGATGCTGCC 1560 

CCCCAAGACA GCACGGCAAA CGGAGAACGT CAGCCGCACC CGGAAGCCCC TGGCCCGCCC 1620 

CAACTCACGC GCTGCCGCCC CCAAAGCCAC TCCAGTGGCT GCTGCCAAAA CCAAGGGGCT 1680 

TGCTGGTGGG GACCGTGCCA GCr -ACCACT CAGTGCCCGG AGTGAGCCCA GTGAGAAGGG 174 0 

AGGCCGGGCA CCCCTGTCCA GA<\\GTCCTC AACCCCCAAG ACTGCCACTC GAGGCCCGTC 18 00 

GGGGTCAGCC AGCAGCCGGC CCGGGGTGTC AGCCACCCCA CCCAAGTCCC CGGTCTACCT 1860 

GGACCTGGCC TACCTGCCCA GCGGGAGCAG CGCCCACCTG GTGGATGAGG AGTTCTTCCA 192 0 

GCGCGTGCGC GCGCTCTGCT ACGTCATCAG TGGCCAGGAC CAGCGCAAGG AGGAAGGCAT 1980 

GCGGGCCGTC CTGGACGCGC TACTGGCCAG CAAGCAGCAT TGGGACCGTG ACCTGCAGGT 2040 

GACCCTGATC CCCACTTTCG ACTCGGTGGC CATGCATACG TGGTACGCAG AGACGCACGC 2100 

CCGGCACCAG GCGCTGGGCA TCACGGTGTT GGGCAGCAAC GGCATGGTGT CCATGCAGGA 2160 

TGACGCCTTC CCGGCCTGCA AGGTGGAGTT CTAGCCCCAT CGCCGACACG CCCCCCACTC 222 0 
AGCCCAGCCC GCCTGTCCCT AGATTCAGCC ACATCAGAAA TAAACTGTGA CTACACTTG 
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TABLE 2 



AAA4 Protein sequence: 
Gene name: CGI -100 protein 
Unigene number: Hs. 2 752 5 3 
Probeset Accession #: AA089688 
Protein Accession #: NP_057124 

Signal sequence: predicted 1-23 (first underlined sequence) 
Transmembrane Domain: predicted 201-217 (second underlined sequence) 
emp24/gp25L/p24 domain: predicted 13-227 

Summary: gp25L/emp24/p24 protein family members of the cis-Golgi network bind both 
COP I and II coatomer. Members of this family are implicated in bringing cargo 
forward from the ER and binding to coat proteins by their cytoplasmic domains. 

MGDKIWLPFP VLLLAALPPV LLP GAAGFTP SLDSDFTFTL PAGQKECFYQ PMPLKASLEI 60 

EYQVLDGAGL DIDFHLASPE GKTLVFEQRK SDGVHTVETE VGDYMFCFDN TFSTISEKVI 120 

FFELILDNMG EQAQEQEDWK KYITGTDILD MKLEDILESI NSIKSRLSKS GHIQTLLRAF 180 
EARDRNIQES NFDRVNFWSM VNLWMVWS AIOVYML KSL FEDKRKSRT 



AAA 7 Protein sequence: 

Gene name: Endothelial differentiation, sphingolipid G-protein-coupled receptor, 1 
(EDG1) 

Unigene number: Hs. 154 2 16 
Probeset Accession #: M31210 
Protein Accession**: NP_001391 

7 Transmembrane Domains: predicted 50-71, 92-110, 122-140, 160-177, 201-222, 251- 
269, 281-301 (underlined sequences) 

Summary: Endothelial differentiation, sphingolipid G-protein-coupled receptor, 1 
may regulate the differentiation of endothelial cells. It binds the sphingolipid 
metabolite, sphingosine-1 -phosphate, which may function as a second messenger in 
cell proliferation and survival. 

MGPTSVPLVK AHRSSVSDYV NYDI I VRHYN YTGKLNISAD KENS I KLTS V VFILICCFII 60 
LENT FVLLTI W KTKKFHRPM YYFIGNLALS D LLAGVAYTA NLLLSGATTY KLTPAQWFLR 120 
E GSMFVALSA SVFSLLAIAI ERYITMLKMK LHNGSNNFRL FLLISACWVI SLILGGL PIM 180 
GWNCISALSS CSTVLPLYHK HYILFCTTVF TLLLLSIVIL YCR IYSLVRT RSRRLTFRKN 240 
ISKASRSSEN VALLKTVIIV LSVFIACWA P LFILLLLDVG CKVKTCDILF RAEYFLVLAV 300 
LNSGTNPIIY TLTNKEMRRA FIRIMSCCKC PSGDSAGKFK RPIIAGMEFS RSKSDNSSHP 360 
QKDEGDNPET IMSSGNVNSS S 



AAB3 Protein sequence: 

Gene name: Solute carrier family 20 (phosphate transporter), member 1, Human 
leukaemia virus receptor 1 (GLVR1) 
Unigene number: Hs.78452 
Probeset Accession #: L20859 
Protein Accession #: NP^_005406 

Transmembrane domains: predicted 24-40, 62-78, 164-180, 198-214, 232-248, 513-529, 
562-578, 604-620, 655-671 

Cellular Localization: Likely a Type Ilia membrane protein (Ncyt Cexo) 

MATL ITS TT A ATAASGPLVD YLWMLILGFI IAFVLAFSVG ANDVANS FGT AVGSGWTLK 60 

Q ACILASIFE TVGSVLLGA K VSETIRKGLI DVEMYNSTQG LLMAGSVSAM FGS AVWQLVA 120 

SFLKLPISGT HCIVGATIGF SLVAKGQEGV KWSELIKIVM SWF VSPLLSG IMSGI LFFLV 180 

RAFILHKADP VPNGLR ALPV FYACTVGINL FSIM YTGAPL LGFDKLPLWG T ILISVGCAV 240 

FCALIVWF FV CPRMKRKIER EIKCSPSESP LMEKKNSLKE DHEETKLSVG DIENKHPVSE 300 

VGPATVPLQA WEERTVSFK LGDLEEAPER ERLPSVDLKE ETSIDSTVNG AVQLPNGNLV 360 

QFSQAVSNQI NSSGHSQYHT VHKDSGLYKE LLHKLHLAKV Gr^lGDSGDK PLRRNNSYTS 420 

YTMAICGMPL DSFRAKEGEQ KGEEMEKLTW PNADSKKRIR ML.- ifTSYCNA VSDLHSASEI 480 

DMSVKAAMGL GDRKGSNGSL EEWYDQDKPE VSLLFOFLOI LTACFGSFAH GGNDVSNAIG 540 

PLVALYLVYD TGDVSSKVAT PIWLLLYGGV GICVGLWVW G RRVIQTMGKD LTPITPSSGF 600 

SIE LASALTV VIASNIGLPI STTHCKVGSV VSVGWLRSKK AVDWRLFRNI FMAW FVTVPI 660 
SGVISAAIMA IFRYVILRM 



AAB4 Protein sequence: 
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Gene name: Matrix metalloproteinase 10 (stromelysin 2) 
Unigene number: Hs.2258 
Probeset Accession #: X07820 
Protein Accession #: NP__002416 

Signal sequence: predicted 1-17 (underlined sequence) 
Cellular Localization: predicted secreted 

MMHLAFLVLL CLPVCSAY PL SGAAKEEDSN KDLAQQYLEK YYNLEKDVKQ FRRKDSNLIV 60 
KKIQGMQKFL GLEVTGKLDT DTLEVMRKPR CGVPDVGHFS SFPGMPKWRK THLTYRIVNY 120 
TPDLPRDAVD SAIEKALKVW EEVTPLTFSR LYEGEADIMI SFAVKEHGDF YSFDGPGHSL 180 
AHAYPPGPGL YGDIHFDDDE KWTEDASGTN LFLVAAHELG HSLGLFHSAN TEALMYPLYN 240 
SFTELAQFRL SQDDVNGIQS LYGPP PASTE EPLVPTKSVP SGSEMPAKCD PALSFDAIST 300 
LRGEYLFFKD RYFWRRSHWN PEPEFHLISA FWPSLPSYLD AAYEVNSRDT VFIFKGNEFW 360 
AIRGNEVQAG YPRGIHTLGF PPTIRKIDAA VSDKEKKKTY FFAADKYWRF DENSQSMEQG 420 
FPRLIADDFP GVEPKVDAVL QAFGFFYFFS GSSQFEFDPN ARMVTHI LKS NSWLHC 



AAB6 Protein sequence : 
Gene name: Podocalyxin-like 
Unigene number : Hs . 1 6 4 2 6 
Probeset Accession #: U97519 
Protein Accession #: NP_005388 

Transmembrane domain: predicted 432-448 (underlined sequence) 
Cellular Localization: predicted Type la membrane protein (Nexo) 

MRCALALSAL LLLLSTPPLL PSSPSPSPSP SPSQNATQTT TDSSNKTAPT PASSVTIMAT 60 

DTAQQSTVPT S KANE I LAS V KATTLGVSSD SPGTTTLAQQ VSGPVNTTVA RGGGSGNPTT 120 

TIESPKSTKS ADTTTVATST ATAKPNTTSS QNGAEDTTNS GGKSSHSVTT DLTSTKAEHL 180 
TTPHPTSPLS PRQPTLTHPV ATPTSSGHDH LMKISSSSST VAIPGYTFTS PGMTTTLPSS ( 240 

VISQRTQQTS SQMPASSTAP SSQETVQPTS PATALRTPTL PETMSSSPTA ASTTHRYPKT 300 

PSPTVAHESN WAKCEDLETQ TQSEKQLVLN LTGNTLCAGG ASDEKLISLI CRAVKATFNP 36 0 

AQDKCGIRLA SVPGSQTVW KEITIHTKLP AKDVYERLKD KWDELKEAGV SDMKLGDQGP 420 

PEEAEDRFSM P LIITIVCMA SFLLLVAA LY GCCHQRLSQR KDQQRLTEEL QTVENGYHDN 4 80 
PTLEVMETSS EMQEKKWSL NGELGDSWIV PLDNLTKDDL DEEEDTHL 



AAB8 Protein sequence: 

Gene name: EGF- containing fibulin-like extracellular matrix protein 1 

Unigene number: Hs. 76224 

Probeset Accession #: U03877 

Protein Accession #: NP_004096 Variant 1 

Signal sequence: predicted 1-17 (underlined sequence) 

Summary: This gene spans approximately 18 kb of genomic DNA and consists of 12 
exons. Two transcripts with distinct 5' UTR have been described; the resulting 
proteins have distinct N- terminal amino acid sequences. Translation initiation 
from internal methionine residues was observed with in vitro translation. A signal 
peptide sequence is predicted for translation initiation sites 1, 2, and 4. The 
protein isof orms contain. 5 or 6 calcium-binding EGF2 domains and 5 or 6 EGF 2 
domains. Mutations in this gene cause the retinal disease Malattia Leventinese. 
Transcript Variant: This variant (1) has a distinct 5' UTR and N- terminal protein 
sequence as compared to variant 2. 

MLKALFLTML TLALVKSO DT EETITYTQCT DGYEWDPVRQ QCKDIDECDI VPDACKGGMK 60 
CVNHYGGYLC LPKTAQIIVN NEQPQQETQP AEGTSGATTG WAASSMATS GVLPGGGFVA 120 
SAAAVAGPEM QTGRNNFVIR RNPADPQRIP SNPSHRIQCA AGYEQSEHNV CQDIDECTAG 180 
THNCRADQVC INLRGSFACQ CPPGYQKRGE QCVDIDECTI PPYCHQRCVN TPGSFYCQCS 24 0 
PGFQLAANNY TCVDINECDA SNQCAQQCYN ILGSFICQCN QGYELSSDRL NCEDIDECRT 300 
SSYLCQYQCV NEPGKFSCMC PQGYQWRSR TCQDINECET TNECREDEMC WNYHGGFRCY 360 
PRNPCQDPYI LTPENRCVCP VSNAMCRELP QSIVYKYMSI RSDRSVPSDI FQIQATTIYA 4*^0 
NTINTFRIKS GNENGEFYLR QTSPVSAMLV LVKSLSGPRE HIVDLEMLTV SSIGTFRTSS S 0 
VLRLTIIVGP FSF 



AAB9 Protein sequence: 

Gene name: Melanoma adhesion molecule, MUC 18 glycoprotein 
Unigene number: Hs. 2 11579 
Probeset Accession #: M28882 
Protein Accession #: NP 006491 
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Signal sequence: predicted 1-17 (first underlined sequence) 
Transmembrane domain: predicted 559-575 (second underlined sequence) 
Cellular localization: predicted Type la membrane protein (Nexo) 

MGL P RLVCAF LLAACCC CPR VAGVPGEAEQ PAPELVEVEV GSTALLKCGL SQSQGNLSHV 60 

DWFSVHKEKR TLIFRVRQGQ GQSEPGEYEQ RLSLQDRGAT LALTQVTPQD ERI FLCQGKR 120 

PRSQEYRIQL RVYKAPEEPN IQVNPLGIPV NSKEPEEVAT CVGRNGYPIP QVIWYKNGRP 180 

LKEEKNRVHI QSSQTVESSG LYTLQSILKA QLVKEDKDAQ FYCELNYRLP SGNHMKESRE 240 

VTVPVFYPTE KVWLEVEPVG MLKEGDRVE I RCLADGNPPP HFSISKQNPS TREAEEETTN 300 

DNGVLVLEPA RKEHSGRYEC QAWNLDTMIS LLSEPQELLV NYVSDVRVSP AAPERQEGSS 360 

LTLTCEAESS QDLEFQWLRE ETDQVLERGP VLQLHDLKRE AGGGYRCVAS VPSIPGLNRT 420 

QLVKIAIFGP PWMAFKERKY WVKENMVLNL SCEASGHPRP TISWNVNGTA SEQDQDPQRV 480 

LSTLNVLVTP ELLETGVECT ASNDLGKNTS ILFLELVNLT TLTPDSNTTT GLSTSTASPH 540 

TRANS TSTER KLPEPESRGV VIVAVIVCIL VLAVL GAVLY FLYKKGKLPC RRSGKQEITL 600 
PPSRKTELW EVKSDKLPEE MGLLQGSSGD KRAPGDQGEK YIDLRH 



AAC1 Protein sequence : 

Gene name: Matrix metalloproteinase 1 (interstitial collagenase) 
Unigene number: Hs.8316 9 
Probeset Accession #: X54925 
Protein Accession #: NP_002412 

Signal sequence: predicted 1-19 (underlined sequence) 
Cellular localization: predicted secreted protein 

MHSFPPLLLL LFWGWSHS F PATLETQEQD VDLVQKYLEK YYNLKNDGRQ VEKRRNSGPV 60 
VEKLKQMQEF FGLKVTGKPD AETLKVMKQP RCGVPDVAQF VLTEGNPRWE QTHLTYRIEN 120 
YTPDLPRADV DHAIEKAFQL WSNVTPLTFT KVSEGQADIM I S FVRGDHRD NSPFDGPGGN 180 
LAHAFQPGPG IGGDAHFDED ERWTNNFREY NLHRVAAHEL GHSLGLSHST DIGALMYPS? 240 
TFSGDVQLAQ DDIDGIQAIY GRSQNPVQPI GPQTPKACDS KLTFDAITTI RGEVMFFKDR * 3 00 
FYMRTNPFYP EVELNFISVF WPQLPNGLEA AYEFADRDEV RFFKGNKYWA VQGQNVLHGY 360 
PKDIYSSFGF PRTVKHIDAA LSEENTGKTY FFVANKYWRY DEYKRSMDPG YPKMIAHDFP 420 
GIGHKVDAVF MKDGFFYFFH GTRQYKFDPK TKRI LTLQKA NSWFNCRKN 



AAC3 Protein sequence : 

Gene name: Branched chain aminotransferase 1, cytosolic 
Unigene number: Hs. 1572 05 
Probeset Accession #: AA423 987 
Protein Accession #: NP_005495 
Cellular Localization: cytolasmic 

Summary: The lack of the cytosolic enzyme branched- chain amino acid transaminase 
(BCT) causes cell growth inhibition. There may be at least 2 different clinical 
disorders due to a defect of branched- chain amino acid transamination: 
hypervalinemia and hyperleucine-isoleucinemia. Since there are 2 distinct BCATs , 
mitochondrial and cytosolic, it is possible that one is mutant in each of these 2 
conditions . 

MDCSNGSAEC TGEGGS KEW GTFKAKDLIV TPATILKEKP DPNNLVFGTV FTDHMLTVEW 60 
SSEFGWEKPH IKPLQNLSLH PGSSALHYAV ELFEGLKAFR GVDNKIRLFQ PNLNMDRMYR 120 
SAVRATLPVF DKEELIiECIQ QLVKLDQEWV PYSTSASLYI RPAFIGTEPS LGVKKPTKAL 180 
LFVLLSPVGP YFSSGTFNPV SLWANPKYVR AWKGGTGDCK MGGNYGSSLF AQCEDVDNGC 240 
QQVLWLYGRD HQ I TEVGTMN LFLYWINEDG EEELATPPLD GI I LPGVTRR CILDLAHQWG 300 
EFKVSERYLT MDDLTTALEG NRVREMFSSG TACWCPVSD ILYKGETIHI PTMENGPKLA 360 
SRILSKLTDI QYGREESDWT IVLS 



ACG4 Protein sequence: 

Gene name: Pentaxin-related gene, rapidly induced by IL-1 beta 
Unigene nut.Ver: Hs.205 0 
Probeset Accession #: M31166 
Protein Accession #: NP_0 02 843 

Signal sequence: predicted 1-17 (underlined sequence) 
Cellular localization: predicted secreted 

Summary: TNF- inducible member of hyaluronate binding protein family, related to 
CD44 

MHLLAILFCA LWSAVLA ENS DDYDLMYVNL DNEIDNGLHP TEDPTPCDCG QEHSEWDKLF 60 
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IMLENSQMRE RMLLQATDDV LRGELQRLRE ELGRLAESLA RPCAPGAPAE ARLTSALDEL 120 

LQATRDAGRR LARMEGAEAQ RPEEAGRALA AVLEELRQTR ADLHAVQGWA ARSWLPAGCE 180 

TAILFPMRSK KIFGSVHPVR PMRLES FSAC IWVKATDVLN KTILFSYGTK RNPYEIQLYL 240 

SYQSIVFWG GEENKLVAEA MVS LGRWTHL CGTWNSEEGL TSLWVNGELA ATTVEMATGH 300 

5 IVPEGGILQI GQEKNGCCVG GGFDETLAFS GRLTGFNIWD SVLSNEEIRE TGGAESCHIR 360 
GNIVGWGVTE IQPHGGAQYV S 

ACK5 Protein sequence: 
10 Gene name: Von Willebrand factor; Coagulation factor VIII 
Unigene number: Hs. 110802 
Probeset Accession #: M103 21 
Protein Accession #: NP_000543 

Signal peptide: predicted 1-22 (underlined sequence) 
15 Cellular localization: predicted secreted 

MI PARFAGVL LALALILPGT LCA EGTRGRS STARCSLFGS DFVNTFDGSM YSFAGYCSYL 60 

LAGGCQKRSF SIIGDFQNGK RVSLSVYLGE FFDIHLFVNG TVTQGDQRVS MP YAS KGLYL 120 

ETEAGYYKLS GEAYGFVARI DGSGNFQVLL SDRYFNKTCG LCGNFNI FAE DDFMTQEGTL 180 

,^0 TSDPYDFANS WALSSGEQWC ERASPPSSSC NISSGEMQKG LWEQCQLLKS TSVFARCHPL 240 

VDPEPFVALC EKTLCECAGG LECACPALLE YARTCAQEGM VLYGWTDHSA CSPVCPAGME 300 

YRQCVSPCAR TCQSLHINEM CQERCVDGCS CPEGQLLDEG LCVESTECPC VHSGKRYPPG 3 60 

0 TSLSRDCNTC ICRNSQWICS NEECPGECLV TGQSHFKSFD NRYFTFSGIC QYLLARDCQD 420 
HS HSFSIVIETV QCADDRDAVC TRSVTVRLPG LHNSLVKLKH GAGVAMDGQD IQLPLLKGDL 480 
JJs RIQHTVTASV RLSYGEDLQM DWDGRGRLLV KLSPVYAGKT CGLCGNYNGN QGDDFLTPSG 54 0 
lZ LAEPRVEDFG NAWKLHGDCQ DLQKQHSDPC ALNPRMTRFS EEACAVLTSP TFEACHRAVS 600 

PLPYLRNCRY DVCSCSDGRE CLCGALASYA AACAGRGVRV AWREPGRCEL NCPKGQVYLQ 660 

01 CGTPCNLTCR SLSYPDEECN EACLEGCFCP PGLYMDERGD CVPKAQCPCY YDGEIFQPED 720 
Q I FSDHHTMCY CEDGFMHCTM SGVPGSLLPD AVLSSPLSHR SKRSLSCRPP MVKLVCPADN ' 780 
130 LRAEGLECTK TCQNYDLECM SMGCVSGCLC PPGMVRHENR CVALERCPCF HQGKEYAPGE ' 840 

TVKIGCNTCV CRDRKWNCTD HVCDATCSTI GMAHYLTFDG LKYLFPGECQ YVLVQDYCGS 900 

[f; NPGTFRILVG NKGCSHPSVK CKKRVTILVE GGEIELFDGE VNVKRPMKDE THFEWESGR 960 

III YIILLLGKAL SWWDRHLSI SWLKQTYQE KVCGLCGNFD GIQNNDLTSS NLQVEEDPVD 1020 

p FGNSWKVSSQ CADTRKVPLD SSPATCHNNI MKQTMVDSSC RILTSDVFQD CNKLVDPEPY 1080 

ff|5. LDVCIYDTCS CESIGDCACF CDT I AAYAHV CAQHGKWTW RTATLCPQSC EERNLRENGY 114 0 

~™ ECEWRYNSCA PACQVTCQHP EPLACPVQCV EGCHAHCPPG KILDELLQTC VDPEDCPVCE 1200 

H VAGRRFASGK KVTLNPSDPE HCQICHCDW NLTCEACQEP GGLWPPTDA PVSPTTLYVE 1260 

H DISEPPLHDF YCSRLLDLVF LLDGSSRLSE AEFEVLKAFV VDMMERLRIS QKWVRVAWE 1320 

YHDGSHAYIG LKDRKRPSEL RRIASQVKYA GSQVASTSEV LKYTLFQIFS KIDRPEASRI 13 80 

40 ALLLMASQEP QRMSRNFVRY VQGLKKKKVI VIPVGIGPHA NLKQIRLIEK QAPENKAFVL 1440 

SSVDELEQQR DEIVSYLCDL APEAPPPTLP PHMAQVTVGP GLLGVSTLGP KRNSMVLDVA 1500 

FVLEGSDKIG EADFNRSKEF MEEVIQRMDV GQDSIHVTVL QYSYMVTVEY PFSEAQSKGD 1560 

ILQRVREIRY QGGNRTNTGL ALRYLSDHSF LVSQGDREQA PNLVYMVTGN PASDEIKRLP 1620 

GDIQWPIGV GPNANVQELE RIGWPNAPIL IQDFETLPRE APDLVLQ^CC SGEGLQIPTL 1680 

45 SPAPDCSQPL DVILLLDGSS SFPASYFDEM KSFAKAFISK ANIGPRLTQV SVLQYGSITT 1740 

IDVPWNWPE KAHLLSLVDV MQREGGPSQI GDALGFAVRY LTSEMHGARP GASKAWILV 1800 

TDVSVDSVDA AADAARSNRV TVFPIGIGDR YDAAQLRI LA GPAGDSNWK LQRIEDLPTM 1860 

VTLGNSFLHK LCSGFVRICM DEDGNEKRPG DVWTLPDQCH TVTCQPDGQT LLKSHRVNCD 1920 

RGLRPSCPNS QSPVKVEETC GCRWTCPCVC TGSSTRHIVT FDGQNFKLTG SCSYVLFQNK 1980 

50 EQDLEVILHN GACS PGARQG CMKSIEVKHS ALSVELHSDM EVTVNGRLVS VPYVGGNMEV 2040 

NVYGAIMHEV RFNHLGHI FT FTPQNNEFQL QLSPKTFASK TYGLCGICDE NGANDFMLRD 2100 

GTVTTDWKTL VQEWTVQRPG QTCQPILEEQ CLVPDSSHCQ VLLLPLFAEC HKVLAPATFY 2160 

AICQQDSCHQ EQVCEVIASY AHLCRTNGVC VDWRTPDFCA MSCPPSLVYN HCEHGCPRHC 2220 

DGNVSSCGDH PSEGCFCPPD KVMLEGSCVP EEACTQCIGE DGVQHQFLEA WVPDHQPCQI 22 80 

55 CTCLSGRKVN CTTQPCPTAK APTCGLCEVA RLRQNADQCC PEYECVCDPV SCDLPPVPHC 234 0 

ERGLQPTLTN PGECRPNFTC ACRKEECKRV SPPSCPPHRL PTLRKTQCCD EYECACNCVN 2400 

STVSCPLGYL ASTATNDCGC TTTTCLPDKV CVHRSTTYPV GQFWEEGCDV CTCTDMEDAV 246 0 

MGLRVAQCSQ KPCEDSCRSG FTYVLHEGEC CGRCLPSACE WTGSPRGDS QSSWKSVGSQ 252 0 

WASPENPCLI NECVRVKEEV FIQQRNVSCP ^LEVPVCPSG FQLSCKTSAC CPSCRCERME 258 0 

60 ACMLNGTVIG PGKTVMIDVC TTCRCMVQVG* ° ISGFKLECR KTTCNPCPLG YKEENNTGEC 2640 

CGRCLPTACT IQLRGGQIMT LKRDETLQDG -JDTHFCKVNE RGEYFWEKRV TGCPPFDEHK 2700 

CLAEGGKIMK IPGTCCDTCE EPECNDITAR LQYVKVGSCK SEVEVDIHYC QGKCASKAMY 2760 
SIDINDVQDQ CSCCSPTRTE PMQVALHCTN GSWYHEVLN AMECKCSPRK CSK 
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AAC7 protein sequence: 

Gene name: KIAA1294 protein 

Probeset Accession #: AA432248 
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Protein Accession #: BAA92532 

Cellular localization: predicted nuclear protein 

PFAM prediction: 22-153 Band 41 domain (underlined seq) . A number of 
cytoskeletal-associated proteins that associate with various proteins at the 
interface between the plasma membrane and the cytoskeleton contain a conserved N- 
terminal domain of about 150 amino-acid residues. 

MAVQLVPDSA LGLLMMTEGR RCOVHLLDDR KLELLVOPKL LAKELLDLVA SHFNLKEKEY 
FG I AFTDETG HLNWLOLDRR VLEHDFPKKS GPWLYFCVR FYIESISYLK DNATIELFFL 
NAKSCIYKEL IDVDSEWFE LASYILOEAK GDFSSNEWR SDLKKLPALP TQALKEHPSL 
AYCEDRVIEH YKKLNGQTRG QAIVNYMSIV ESLPTYGVHY YAVKDKQGIP WWLGLSYKGI 
FQYDYHDKVK PRKIFQWRQL ENLYFREKKF SVEVHDPRRA SVTRRTFGHS GIAVHTWYAC 
PALIKSIWAM AISQHQFYLD RKQSKSKIHA ARSLSEIAID LTETGTLKTS KLANMGSKGK 
IISGSSGSLL SSGSQESDSS QSAKKDMLAA LKSRQEALEE TLRQRLEELK KLCLREAELT 
GKLPVEYPLD PGEEPPIVRR RIGTAFKLDE QKILPKGEEA ELERLEREFA IQSQITEAAR 
RLASDPNVSK KLKKQRKTSY LNALKKLQE I ENAINENRIK SGKKPTQRAS LI IDDGNI AS 
EDSSLSDALV LEDEDSQVTS TISPLHSPHK GLPPRPPSHN RPPPPQSLEG LRQMHYHRND 
YDKSPIKPKM WSESSLDEPY EKVKKRSSHS HSSSHKRFPS TGSCAEAGGG SNSLQNSPIR 
GLPHWNSQSS MPSTPDLRVR SPHYVHSTRS VDISPTRLHS LALHFRHRSS SLESQGKLLG 
SENDTGSPDF YTPRTRSSNG SDPMDDCSSC TSHSSSEHYY PAQMNANYST LAEDSPSKAR 
QRQRQRQRAA GALGSASSGS MPNLAARGGA GGAGGAGGGV YLHSQSQPSS QYRIKEYPLY 
IEGGATPWV RSLESDQECH YSVKAQFKTS NSYTAGGLFK ESWRGGGGDE GDTGRLTPSR 
SQILRTPSLG REGAHDKGAG RAAVSDELRQ WYQRSTASHK EHSRLSHTSS TSSDSGSQYS 
TSSQSTFVAH SRVTRMPQMC KATSAALPQS QRSSTPSSEI GATPPSSPHH ILTWQTGEAT 
ENSPILDGSE SPPHQSTDE 



ACG8 Protein sequence: 

Gene name: ubiquitin E3 ligase SMURF2 f 
Unigene number: Hs. 21806 (3'UTR only) 
Probeset Accession #: AA398243 
Protein Accession #: AF301463_1 
Cellular Localization: predicted cytoplasmic 
Summary: Smurf2 Is a Ubiquitin E3 Ligase Mediating Proteasome- dependent 
Degradation of Smad2 in Transforming Growth Factor-beta Signaling 

MSNPGGRRNG PVKLRLTVLC AKNLVKKDFF RLPDPFAKW VDGSGQCHST DTVKNTLDPK 60 

WNQHYDLYIG KSDSVTISW NHKKIHKKQG AGFLGCVRLL SNA I NRLKDT GYQRLDLCKL 12 0 

GPNDNDTVRG QIWSLQSRD RIGTGGQWD CSRLFDNDLP DGWEERRTAS GRIQYLNHIT 180 

RTTQWERPTR PASEYSSPGR PLSCFVDENT PISGTNGATC GQSSDPRLAE RRVRS QRHRN 240 

YMSRTHLHTP PDLPEGYEQR TTQQGQVYFL HTQTGVSTWH DPRVPRDLSN INCEELGPLP 300 

PGWEIRNTAT GRVYFVDHNN RTTQFTDPRL SANLHLVLNR QNQLKDQQQQ QWSLCPDDT 360 

ECLTVPRYKR DLVQKLKILR QELSQQQPQA GHCRIEVSRE EIFEESYRQV MKMRPKDLWK 420 

RLMI KFRGEE GLDYGGVARE WLYLLSHEML NPYYGLFQYS RDDIYTLQIN PDSAVNPEHL 4 80 

SYFHFVGRIM GMAVFHGHYI DGGFTLPFYK QLLGKSITLD DMELVDPDLH NSLVWILEND 540 

I TGVLDHTFC VEHNAYGEII QHELKPNGKS IPVNEENKKE YVRLYVNWRF LRGIEAQFLA 600 

LQKGFNEVIP QHLLKTFDEK ELELIICGLG KIDVNDWKVN TRLKHCTPDS NIVKWFWKAV 660 

EFFDEERRAR LLQFVTGSSR VPLQGFKALQ G AAGPRLFT I HQ I D ACTNNL PKAHTCFNRI 720 
DIPPYESYEK LYEKLLTAIE ETCGFAVE 



ACH1 Protein sequence: 
Gene name: EST- 
Unigene number: Hs.30089 
Probeset Accession #: AA410480 
CAT cluster#: cluster 96816_1 
Summary: predicted open reading frame 

PLWTEPPLSC CLPATYPADR GPAEPCSCAG VILGFLLFRG HNSQPTMTQT S^SQGGLGGL 60 

SLTTEPVSSN PGYIPSSEAN RPSHLSSTGT PGAGVPSSGR DGGTSRDTFQ 1 PPNSTTMS 120 

LSMREDATIL PSPTSETVLT VAAFGVISFI VILVWVIIL VGWSLRFKC Rr^KESGDPQ 180 
KPGEREEKVG HRREPYPWN 



AC J 2 Protein sequence: 

Gene name : Complement component Clq receptor 
Unigene number: Hs.9719 9 
Probeset Accession #: AA487558 
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Protein Accession #: NP_036204 

Signal sequence: 1-17 (first underlined sequence) 
Transmemrane domain: 589-605 (second underlined sequence) 

Cellular localization: This gene encodes a predicted type I membrane protein. 
Summary: This protein acts as a receptor for complement protein Clq, mannose- 
binding lectin, and pulmonary surfactant protein A. This protein is a functional 
receptor involved in ligand-mediated enhancement of phagocytosis. 

MATSMGLLLL LLLLLTOPG A GTGADTEAW CVGTACYTAH SGKLSAAEAQ NHCNQNGGNL 60 

ATVKSKEEAQ HVQRVLAQLL RREAALTARM SKFWIGLQRE KGKCLDPSLP LKGFSWVGGG 120 

EDTPYSNWHK ELRNSCISKR CVSLLLDLSQ PLLPNRLPKW SEGPCGSPGS PGSNIEGFVC 180 

KFSFKGMCRP LALGGPGQVT YTTPFQTTSS SLEAVPFASA ANVACGEGDK DETQSHYFLC 24 0 

KEKAPDVFDW GSSGPLCVSP KYGCNFNNGG CHQDCFEGGD GSFLCGCRPG FRLLDDLVTC 300 

ASRNPCSSSP CRGGATCVLG PHGKNYTCRC PQGYQLDSSQ LDCVDVDECQ DSPCAQECVN 3 60 

TPGGFRCECW VGYEPGGPGE GACQDVDECA LGRSPCAQGC TNTDGSFHCS CEEGYVLAGE 420 

DGTQCQDVDE CVGPGGPLCD SLCFNTQGSF HCGCLPGWVL APNGVSCTMG PVSLGPPSGP 4 80 

PDEEDKGEKE GSTVPRAATA SPTRGPEGTP KATPTTSRPS LSSDAPITSA PLKMLAPSGS 540 

SGVWREPSIH HATAASGPQE PAGGDSSVAT QNNDGTDGQK LLLFYILGTV VAI LLLLALA 600 
LGLLVYRKRR AKREEKKEKK PQNAADSYSW VPERAESRAM ENQYSPTPGT DC 



ACJ3 Protein sequence: 

Gene name: FLTl/vascular endothelial growth factor receptor 
Unigene number: Hs. 138671 
Probeset Accession #: AA047437 

Transmembrane domain: predicted 764-78 0 (underlined sequence) 
Cellular Localization: predicted cell surface tyrosine kinase 

MVSYWDTGVL LCALLS CLLL TGSSSGSKLK DPELSLKGTQ HIMQAGQTLH LQCRGEAAHK 
WSLPEMVSKE SERLSITKSA CGRNGKQFCS TLTLNTAQAN HTGFYSCKYL AVPTS KKKET 
ESAIYIFISD TGRPFVEMYS EIPEIIHMTE GRELVIPCRV TSPNITVTLK KFPLDTLIPD 
GKRIIWDSRK GFIISNATYK EIGLLTCEAT VNGHLYKTNY LTHRQTNTII DVQISTPRPV 
KLLRGHTLVL NCTATTPLNT RVQMTWSYPD EKNKRASVRR RIDQSNSHAN IFYSVLTIDK 
MQNKDKGLYT CRVRSGPSFK SVNTSVHIYD KAFITVKHRK QQVLETVAGK RSYRLSMKVK 
AFPSPEWWL KDGLPATEKS ARYLTRGYSL I IKDVTEEDA GNYTILLSIK QSNVFKNLTA 
TLIVNVKPQI YEKAVSSFPD PALYPLGSRQ ILTCTAYGIP QPTIKWFWHP CNHNHSEARC 
DFCSNNEESF ILDADSNMGN RIESITQRMA I IEGKNKMAS TLWADSRIS GIYICIASNK 
VGTVGRNISF YITDVPNGFH VNLEKMPTEG EDLKLSCTVN KFLYRDVTWI LLRTVNNRTM 
HYSISKQKMA ITKEHSITLN LTIMNVSLQD SGTYACRARN VYTGEEILQK KEITIRDQEA 
PYLLRNLSDH TVAISSSTTL DCHANGVPEP QITWFKNNHK IQQEPGIILG PGSSTLFIER 
VTEEDEGVYH CKATNQKGSV ESSAYLTVQG TSDKSNLELI TT.TPTCVAAT LFWLLLTLLI 
RKMKRSSSEI KTDYLSIIMD PDEVPLDEQC ERLPYDASKW EFARERLKLG KSLGRGAFGK 
WQASAFGIK KSPTCRTVAV KMLKEGATAS EYKALMTELK ILTHIGHHLN WNLLGACTK 
QGGPLMVIVE YCKYGNLSNY LKSKRDLFFL NKDAALHMEP KKEKMEPGLE QGKKPRLDSV 
TSSESFASSG FQEDKSLSDV EEEEDSDGFY KEPITMEDLI SYS FQVARGM EFLSSRKCIH 
RDLAARNILL SENNWKICD FGLARDIYKN PDYVRKGDTR LPLKWMAPES IFDKIYSTKS 
DVWSYGVLLW EIFSLGGSPY PGVQMDEDFC S RLRE GMRMR APEYSTPEIY QIMLDCWHRD 
PKERPRFAEL VEKLGDLLQA NVQQDGKDYI PINAILTGNS GFTYSTPAFS EDFFKESISA 
PKFNSGSSDD VRYVNAFKFM SLERIKTFEE LLPNATSMFD DYQGDSSTLL ASPMLKRFTW 
TDSKPKASLK IDLRVTSKSK ESGLSDVSRP SFCHSSCGHV SEGKRRFTYD HAELERKIAC 
CSPPPDYNSV VLYSTPPI 



AC J 9 Protein sequence: 

Gene name: Purine nucleoside phosphorylase 
Unigene number: Hs. 75514 
Probeset Accession #: K02574 
Protein Accession #: CAA25320 
Cellular Localization: predicted cytoplasmic 
Summary: likely to catalyze the reversible phosphorolytic cleavage of purine ^ 
ribonucleosides and 2 1 -deoxyribonucleosides 

MENGYTYEDY KKTAEWLLSH TKHRPQVAI I CGSGLGGLTD KLTQAQIFDY SEIPNFPRST 
V P GHAGRLV F GFLNGRACVM MQGRFHMYEG YPLWKVTFPV RVFHLLGVDT LWTNAAGGL 
NPKFEVGDIM LIRDHINLPG FSGQNPLRGP NDERFGDRFP AMSDAYDRTM RQRALSTWKQ 
MGEQRELQEG TYVMVAGPSF ETVAE CRVLQ KLGADAVGMS TVPEVIVARH CGLRVFGFSL 
ITNKVIMDYE SLEKANHEEV LAAGKQAAQK LEQFVSILMA SIPLPDKAS 
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ACK4 Protein sequence 
Gene name: EST 
Probeset Accession #; R68763 
5 Predicted amino acid seq: FGENESH exon prediction on BAC clone AC009414 

Predicted nuclear target motifs: from 25 (4) RRRP (underlined) ; 176 (5) RRRR 
(underlined); 177 (5) RRRR (underlined; 239 (5) KRKK (underlined); 399 (4) PPRARRT 
(underlined); 400 (5) PRARRTE (underlined) 
Cellular localization; predicted nuclear 

10 

MPPEQHHQPN KVSPKLCSAQ PAPRGRRRPG GRGPAAGGRT FANARFVLGE GVAIERGADD 60 
TTQPPVAGSV NPEGAAAALV PLAGARVAAA ADALHDAPRA VPGLLALGLV TGQADQRPGA 120 
GARQQQQQPQ QRDQEVPAAG QPPVPRHQVH PPAPPPPPPR SRAGSGAGAL PCAGH TRRRR 180 
RTSSPRSSPP LSGPPGRASP RGARPPPLLR AAPTPSPRAL APAAASPPPP PPPPGREGEK 240 
15 RKKFPPGSSG STQTSGAAAA VAAALGSSPG RRRLLPLLLR VGRPRSGAAS GPVPASRAAE 3 00 
WARWRSTRSA ASAPRAPLAS LLRRSSGRLF MAGASAARAA PSPILPPPPD LPPTPTRRAP 3 60 
LIGCPPSPAR PAPSASPSPS RAAGPFLPPS HASTSSRS PP PRARRT E P AV PPSCGSGPGA 420 
AGALRMGLGR TQRAARVAVS RALAGTVAAA AGLGARRARR LHLRGQIGVR RVAGTPEARG 480 
RGDGCSLGRV SPDRTPGKGS KGMEPPHTG 

mo 



T: AAA8 Protein sequence : 

: u Gene name: ETL protein, with extended open reading frame 

P~ Unigene number: Hs.5795 8 

£p5 Probeset Accession #: D5&024 

ffl Protein Accession #: AAG33021 

Transmembrane domains: predicted 454-470, 486-502, 511-527, 528-544, 556-572, 600- 

^ ? 616, 642-661, 672-689 (underlined sequences) 

B Extended sequence: Residues 1-564 were added to the sequence in, AAG33021 

NiO Cellular Localization: predicted cell surface serpentine receptor' 



ri MKTAALTPPR SPPPPPLRPP PMKRLPLLW FSTLLNCSYT QNCTKTPCLP NAKCEIRNGI 60 

EACYCNMGFS GNGVTICEDD NECGNLTQSC GENANCTNTE GSYYCMCVPG FRSSSNQDRF 120 

ITNDGTVCIE NVNANCHLDN VCIAANINKT LTKIRSIKEP VALLQEVYRN SVTDLSPTDI 180 

CI 5 ITYIEILAES SSLLGYKNNT ISAKDTLSNS TLTEFVKTVN NFVQRDTFW WDKLSVNHRR 240 

H THIiTKLMHTV EQATLRI SQS FQKTTEFDTN STDIALKVFF FDSYNMKHIH PHMNMDGDYI 300 

NIFPKRKAAY DSNGNVAVAF LYYKSIGPLL SSSDNFLLKP QNYDNSEEEE RVISSVISVS 360 

MSSNPPTLYE LEKITFTLSH RKVTDRYRSL CAFWNYSPDT MNGSWSSEGC ELTYSNETHT 420 

SCRCNHLTHF AILMSSGPSI GIKDYNILTR ITOL GIIISL ICIiAICIFTF WFF SEIQSTR 480 

40 TTIHK NLCCS LFLAELVFLV GIN TNTNKLX SVSIIAGLLH YFFLAAFAWM CIEGIHLYLI 540 

WGV IYNKGF LHKN FYI FGY LSPAVWGFS AA LGYRYYGT TKVCWLSTET HFIWSFIGPA 600 

CLIILVNLLA FGVIIY KVFR HTAGLKPEVS CFENIRSCAR GALALLFLLG TTWI FGVLHV 660 
VHASWTAYL F TVSNAFOGM FIFLFLCVL S RKIQEEYYRL FKNVPCCFGC LR 



45 

AAC6 Protein sequence: 
Gene name: EST 
Unigene number: Hs. 13 4 7,9 7 
Probeset Accession #: AA025351 
50 Protein accession #: BAB14599 

Signal sequence; predicted 1-24 (first underlined sequence) 
extended sequence: second underlined sequence 

MILSLLFSLG GPLGWGLLGA WAOA SSTSLS DLQSSRTPGV WKAEAEDTSK DPVGRNWCPY 60 
55 PMSKLVTLLA LCKTEKFLIH SQQPCPQGAP DCQKVKVMYR MAHKPVYQVK QKVLTSLAWR 120 
CCPGYTGPNC EHHDSMAIPE PADPGDSHQE PQDGPVSFKP GHLAAVINEV EVQQEQQEHL 180 
LGDLQNDVHR VADSLPGLWK ALPGNLTAAV ME ANQTGH E F PDRSLEQVLL PHVDTFLOVH 240 
FSPIWRSFNO SLHSLTOAIR NLSLDVEANR OAISRVQDSA VARADFOELG AKFEAKVOEN 3 00 
TORVGOLROD VEDRLHAQ ^ F TLHRSISELQ ADVDTKLKRL HKAOEAPGTN GS LVLATPG A 360 
60 GARPEPDSIiO ARLGOLOF • i-i SELHMTTARR EEELOYTLED MRATLTRHVD EIKELYSESD 420 
ETFDOISKVE ROVEELOVx^H TALRELRVIL MEKSLIMEEN KEEVEROLLE LNLTLOHLQG 480 
GHADLIKYVK DCNCQKLYLD LDVIREGORD ATRALEETQV SLDERROLDG SSLQALQNAV 54 0 
D AVS LAVDAH KAEGERARAA TSRLRSQVQA LDDEVGALKA AAAEARHEVR OLHSAFAALL 600 
EDALRHEAVL AALFGEEVLE EMSEOTPGPL PLSYEQIRVA LODAASGLOE OALGWDELAA 660 
65 RVTALEOASE PPRPAEHLEP SHDAGREEAA TTALAGLARE hOS LSNDVKN VGRCCEAEAG 720 
AGAASLNASL DGLHNALFAT ORSLEOHORL FHSLFGNFQG LMEANVSLDL GKLQTMLSRK 7 80 
GKKOQKDLEA PRKRDKKEAE PLVDXRVTGP VPGALGAALW EASPVAFYAS FSEGTAALOT 840 
VKFNTTYINI GSSYFPEHGY FRAPERGVYL FAVSVEFGPG PGTGOLVFGG HHRTPVCTTG 900 
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OGSGSTATVF AMAELOKGER VWFELTOGSI TKRSLSGTAF GGFLMFKT 



10 



ACH7 Protein sequence: 

Gene name: EST 

Unigene number: Hs.3 807 

Probeset Accession #: AA292694 

BAC Accession #: AL161751 

FGENESH predicted aa seq: 1-647; based on BAC clone AL161751 



MGKDFMTKTP KAFATKAKID KWDLIKLKSF CTAKETIIRV NSQPTDWQKT FAIYPSDKGV 60 

IARIYKELEQ IYKKKKPTKT LRTHFLSRPK GNCWPLGPRG DSWQLGGPSG ARAEGKGGGT 120 

GLGKPAVEGG DRAPDTALRP RAGQIQVGSS SACGASENEA GVRPVPPLAG ALARAGRRRT 180 

PHCRPCWLLG LGGLLQPAPR YHEAAGGRGG LHPARWGAQH RACGRRAARC ARAPAGRPRA 24 0 

15 RRGLQRPAVL GRTGAQAFPL HPGERAFAGF LLAVLRPRRS RKRHAAVGGG APTLLHRAEM 3 00 

RGTPGHRWGR ARSWKEMRCH LRANGYLCKY QFEVLCPAPR PGAASNLSYR APFQLHSAAL 360 

DFSPPGTEVS ALCRGQLPIS VTCIADEIGA RWDKLSGDVL CPCPGRYLRA GKCAELPNCL 420 

DDLGGFACEC ATGFELGKDG RSCVTSGEGQ PTLGGTGVPT RRPPATATSP VPQRTWPIRV 4 80 

DEKLGETPLV PEQDNSVTSI PEIPRWGSQS TMSTLQMSLQ AESKATITPS GSVISKFNST 540 

20 TSSATPQAFD SSSAWFIFV STAWVLVIL TMTVLGLVKL CFHESPSSQP RKESMGPPGL 600 
M* ESDPEPAALG SSSAHCTNNG VKVGDCDLRD RAEGALLAES PLGSSDA 

Tl AAD4 Protein sequence 

1^25 Gene name: ERG 

P~ unigene number: Hs.45514 

CP Probeset Accession #: R32894 

01 Protein Accession #: AAA52398 

i-; Signal sequence: none « 

^20 Transmembrane domains: none 

s PFAM domains: predicted Ets-domain 294-373; SAM_PNT: 122-206 

H Summary: ERG2 is a sequence-specific DNA-binding protein. 

« MIQTVPDPAA HIKEALSWS EDQSLFECAY GTPHLAKTEM TASSSSDYGQ TSKMSPRVPQ 60 

!:!'35 QDWLSQPPAR VTIKMECNPS QVNGSRNSPD EC S VAKGGKM VGS PDTVGMN YGSYMEEKHM 120 

IP PPPNMTTNER RVIVPADPTL WSTDHVRQWL EWAVKEYGLP DVNILLFQNI DGKELCKMTK 180 

O DDFQRLTPSY NADILLSHLH YLRETPLPHL TSDDVDKALQ NSPRLMHARN TDLPYEPPRR 240 

U SAWTGHGHPT PQSKAAQPSP STVPKTEDQR PQLDPYQILG PTSSRLANPG SGQIQLWQFL 3 00 

LELLSDSSNS SCITWEGTNG EFKMTDPDEV ARRWGERKSK PNMNYDKLSR ALRYYYDKNI 3 60 

40 MTKVHGKRYA YKFDFHGIAQ ALQPHPPESS LYKYPSDLPY MGSYHAHPQK MNFVAPHPPA 420 

LPVTSSSFFA APNPYWNSPT GGIYPNTRLP TSHMPSHLGT YY 462 

AAD5 Protein sequence 

4 5 Gene name: activin A receptor type II -like 1 (ALK-1) 

Unigene number: Hs . 172670 
Probeset Accession #: T57112 
Protein Accession #: NP^OOOOll 
Signal sequence: predicted 1-21 

5 0 Transmembrane domain: predicted 119-135 

PFAM domains: predicted pkinase 204-489 

Summary: Type la membrane protein; receptor tyrosine kinase 

MTLGSPRKGT, LMLLMALVTO G DPVKPSRGP LVTCTCESPH CKGPTCRGAW CTWLVREEG 60 

55 RHPQEHRGCG NLHRELCRGR PTEFVNHYCC DSHLCNHNVS LVLEATQPPS EQPGTDGQLA 120 

L I LG P VL ALL ALVAL GVLGL WHVRRRQEKQ RGLHSELGES SLILKASEQG DTMLGDLLDS 180 

DCTTGSGSGL PFLVQRTVAR Q VALVE CVGK GRYGEVWRGL WHGESVAVKI FSSRDEQSWF 24 0 

RETE I YNTVL LRHDNILGFI ASDMTSRNSS TQLWLITHYH EHGSLYDFLQ RQTLEPHLAL 300 

RLAVSAACGL AHLHVE I FGT QGKPAIAHRD FKSRNVLV^S NLQCCIADLG LAVMHSQGSD 360 

60 YLDIGNNPRV GTKRYMAPEV LDEQIRTDCF ESYKWTD1" A FGLVLWEIAR RTIVNGIVED 420 

YRPPFYDWP NDPSFEDMKK WCVDQQTPT IPNRLAADi/V LSGLAQMMRE CWYPNPSARL 480 
TALRI KKTLQ KISNSPEKPK VIQ 

65 AAD8 Protein sequence 
Gene name: ESTs 
Unigene number: Hs. 14 4 95 3 
Probeset Accession #: AA404418 
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Protein Accession #: n/a 
Signal sequence: n/a 
Transmembrane domains : n/a 
PFAM domains: n/a 

Summary: no ORF identified; possible frameshifts. Nearby to PCTAIRE protein 
kinase 2 (PCTK2) on the genome (within 100 kb) . 



ACA2 Protein sequence 
Gene name: EST 
Unigene number: Hs.164 50 
Probeset Accession #: AA478778 
Protein Accession #: n/a 
Signal sequence: n/a 
Transmembrane domains: n/a 
PFAM domains: n/a 

Summary: no ORF identified, possible frameshifts; although a match was found to 
the HTGS genomic sequence, the sequence does not extend far enough upstream to 
predict coding exons . 

ACA4 Protein sequence 

Gene name: alpha satellite junction DNA sequence 

Unigene number: Hs. 2 4 794 6 

Probeset Accession #: M213 05 

Protein Accession #: AAA88020 

Signal sequence: none 

Transmembrane domains : none 

PFAM domains; none 

t 

MEWNGMAWNR IKWNGINSSG MEWNGMEWNA VQCNRMEWNE LELTGMEWNG MHLN 



ACG6 Protein sequence 

Gene name: intercellular adhesion molecule 2 (ICAM2) 

Unigene number: Hs» 83 733 

Probeset Accession #: M32334 

Protein Accession #: NP_000864 

Signal sequence: predicted 1-21 

Transmembrane domain: predicted 224-24 8 

PFAM domains: predicted 41-98, 127-197; immunoglobulin- like C2-type domains 
Summary: a predicted Type la membrane protein; it plays a role in cell adhesion 
and is the ligand for the LFA-1 protein. ICAM2 is also called CD102. 

MSSFGYRTLT VALFTLICCP GSDEKVFEVH VRPKKLAVEP KGSLEVNCST TCNQPEVGGL 60 

ETSLNKILLD EQAQWKHYLV SNISHDTVLQ CHFTCSGKQE SMNSNVSVYQ PPRQVILTLQ 120 

PTLVAVGKSF TIECRVPTVE PLDSLTLFLF RGNETLHYET FGKAAPAPQE ATATFNSTAD 180 

REDGHRNFSC LAVLDLMSRG GNIFHKHSAP KMLEIYEPVS DSQMVIIVTV VSVIiLSLFVT 24 0 
SVLLCFIFGQ HLRQQRMGTY GVRAAWRRLP QAFRP 



ACG7 Protein sequence 

Gene name: Cadherin 5, VE-cadherin (CDH5) 
Unigene number: Hs.76206 
Probeset Accession #: X79981 - 
Protein Accession #: NP__001786 
Signal sequence: predicted 1-27 
Transmembrane domain: predicted 604-620 

PFAM domains: Cadherin domains predicted 53-141, 156-249, 263-364, 377-470, and 
487-576 

Summary: Likely a Type I membrane protein. Cadherins are calc. m-dependent 
adhesive proteins that mediate cell-to-cell interaction. VE-cao*ierin is associated 
with intercellular junctions. 

MQRLMMLLAT SGACLGLLAV AAVAAAGANP AQRDTHSLLP THRRQKRDWI WNQMHIDEEK 60 
NTSLPHHVGK IKSSVSRKNA KYLLKGEYVG KVFRVDAETG DVFAIERLDR ENISEYHLTA 120 
VIVDKDTGEN LETPSSFTIK VHDVNDNWPV FTHRL FNAS V PESSAVGTSV ISVTAVDADD 180 
PTVGDHASVM YQILKGKEYF AIDNSGRIIT ITKSLDREKQ AR YE I WEAR DAQGLRGDSG 240 
TATVLVTLQD INDNFPFFTQ TKYTFWPED TRVGTSVGSL FVEDPDEPQN RMTKYSILRG 3 00 
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DYQDAFTIET NPAHNEGI I K PMKPLDYEYI QQYSFIVEAT DPTIDLRYMS PPAGNRAQVI 360 

INITDVDEPP IFQQPFYHFQ LKENQKKPLI GTVLAMDPDA ARHSIGYSIR RTSDKGQFFR 420 

VTKKGDIYNE KELDREVYPW YNLTVEAKEL DSTGTPTGKE SIVQVHIEVL DENDNAPEFA 4 80 

KPYQPKVCEN AVHGQLVLQI SAIDKDITPR NVKFKFTLNT ENNFTLTDNH DNTANI TVKY 540 

GQFDREHTKV HFLPWISDN GMPSRTGTST LTVAVCKCNE QGEFTFCEDM AAQVGVSIQA 600 

WAILLCILT ITVITLLIFL RRRLRKQARA HGKSVPEIHE QLVTYDEEGG GEMDTTSYDV 660 

SVLNSVRRGG AKPPRPALDA RPSLYAQVQK PPRHAPGAHG GPGEMAAMIE VKKDEADHDG 720 

DGPPYDTLHI YGYEGSESIA ESLSSLGTDS SDSDVDYDFL NDWGPRFKML AELYGSDPRE 780 
ELLY 



ACG9 Protein sequence 

Gene name: lysyl oxidase -like 2 (LOXL2 } 
Unigene number: Hs.83354 
Probeset Accession #: U89942 
Protein Accession #: NP_002309 
Signal sequence: predicted 1-25 
Transmembrane domains: none predicted 

PFAM domains: scavenger receptor cysteine-rich domains predicted 68-159, 203-233, 
336-425, 439-528; Lysyl oxidase predicted 548-749. 

Summary: Likely a secreted protein. Lysyl oxidase is a copper-dependent amine 
oxidase that belongs to a heterogeneous family of enzymes that oxidize primary 
amine substrates to reactive aldehydesm, acting on the extracellular matrix 
substrates, e.g., collagen and elastin. 
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ACH2 Protein sequence 

Gene name: TIE tyrosine-protein kinase 

Unigene number: Hs. 78824 

Probeset Accession #: X60957 

Protein Accession #: NP_005415 

Signal sequence: predicted 1-21 

Transmembrane domain: predicted 770-786 

PFAM domains: laminin-EGF predicted 234-267; FN3 predicted 460-520, 548-632, and 
644-729; tyrosine kinase predicted 839-1107 

Summary: Likely a Type la membrane protein; TIE is a tyrosine-kinase receptor with 
an unknown ligand; its expression is likely necessary for normal blood vessel 
development . 
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INLLGACKNR GYLYIAIEYA PYGNLLDFLR KSRVLETDPA FAREHGTAST LSSRQLLRFA 960 

SDAANGMQYL SEKQFIHRDL AARNVLVGEN LASKIADFGL SRGEEVYVKK TMGRLPVRWM 1020 

AIESLNYSVY TTKSDVWSFG VLLWEIVSLG GTPYCGMTCA ELYEKLPQGY RMEQPRNCDD 1080 
EVYELMRQCW RDRPYERPPF AQIALQLGRM LEARKAYVNM SLFENFTYAG IDATAEEA 



ACH3 Protein sequence 

Gene name: placental growth factor (PGF; P1GF1; VEGF- related protein) 

Unigene number: Hs.2894 

Probeset Accession #: X54 93 6 

Protein Accession #: NP__002623 

Signal sequence: predicted 1-21 

Transmembrane domain: none predicted 

PFAM domains: PDGF predicted 52-130 

Summary: Likely a secreted protein; likely regulates angiogenesis by interacting 
with FLTl and FLK1 . 

MPVMRLFPCF LQLLAGLALP AVPPQQWALS AGNGSSEVEV VPFQEVWGRS YCRALERLVD 60 
WSEYPSEVE HMFSPSCVSL LRCTGCCGDE NLHCVPVETA NVTMQLLKIR SGDRPSYVEL 120 
TFSQHVRCEC RPLREKMKPE RCGDAVPRR 



ACH4 Protein sequence 
Gene name: nidogen 2 (NID2) 
Unigene number: Hs. 82733" 
Probeset Accession #: D86425 
Protein Accession #: NP_0313 87 
Signal sequence: predicted 1-30 

Transmembrane domain: none predicted » 

PFAM domains: EGF-like_domains predicted 489-524, 764-800, 806-843, 853-891, and 
897-930; thyroglobulin_repeats predicted 941-1006, and 1020-1085; 
LDL_receptor_repeats predicted 1155-1197, 1199-1240, and 1242-1285. 

Summary: A secreted protein; NID2 likely interacts with collagens I and IV and 
laminin-1 to promote cell adhesion to the basement membrane. 

MEGDRVAGRP VLSSLPVLLL LQLLMLRAAA LHPDELFPHG ESWWDQLLQE GDDVKLSRGE 60 

AGESPALLTK PDSATSTWAP TASSPLRTSP GKRSMWTMIS PPTSRPSPLF WRTSTRATAE 120 

AESCTERTPP PQCWAWPPAM CALASRALRA FYPHPRLPGH LGAGRRLRGG QTRALPSGEL 180 

NTFQAVLASD GSDSYALFLY PANGLQFLGT RPKESYNVQL QLPARVGFCR GEADDLKSEG 240 

PYFSLTSTEQ SVKNLYQLSN LGIPGVWAFH IGSTSPLDNV RPAAVGDLSA AHSSVPLGRS 3 00 

FSHATALESD YNEDNLDYYD VNEEEAEYLP GEPEEALNGH SSIDVSFQSK VDTKPLEESS 360 

TLDPHTKEGT SLGEVGGPDL KGQVEPWDER ETRSPAPPEV DRDSLAPSWE TPPPYPENGS 420 

IQPYPDGGPV PSEMDVPPAH PEEEIVLRSY PASGHTTPLS RGTYEVGLED NIGSNTEVFT 480 

YNAANKETCE HNHRQCSRHA FCTDYATGFC CHCQSKFYGN GKHCLPEGAP HRVNGKVSGH 540 

LHVGHTPVHF TDVDLHAYIV GNDGRAYTAI SHIPQPAAQA LLPLTPIGGL FGWLFALEKP 600 

GSENGFSLAG AAFTHDMEVT FYPGEETVRI TQTAEGLDPE NYLSIKTNIQ GQVPYVPANF 660 

TAHISPYKEL YHYSDSTVTS TSSRDYSLTF GAINQTWSYR IHQNITYQVC RHAPRHPSFP 72 0 

TTQQLNVDRV FALYNDEERV LRFAVTNQIG PVKEDSDPTP VNPCYDGSHM CDTTARCHPG 780 
TGVDYTCECA SGYQGDGRNC VDENECATGF HRCGPNSVCI NLPGSYRCEC RSGYEFADDR . 84 0 

HTCILITPPA NPCEDGSHTC APAGQARCVH HGGSTFSCAC LPGYAGDGHQ CTDVDECSEN 900 

RCHPAATCYN TPGSFSCRCQ PGYYGDGFQC IPDSTSSLTP CEQQQRHAQA QYAYPGARFH 960 

IPQCDEQGNF LPLQCHGSTG FCWCVDPDGH EVPGTQTPPG STPPHCGPSP EPTQRPPTIC 1020 

ERWRENLLEH YGGTPRDDQY VPQCDDLGHF IPLQCHGKSD FCW CVDKDGR EVQGTRSQPG 10 80 

TTPACIPTVA PPMVRPTPRP DVTPPSVGTF LLYTQGQQIG YLPLNGTRLQ KDAAKTLLSL 1140 

HGSIIVGIDY DCRERMVYWT DVAGRTISRA GLELGAEPET IVNSGLISPE GLAIDHIRRT 120 0 

MYWTDSVLDK IESALLDGSE RKVLFYTDLV NPRAIAVDPI RGNLYWTDWN REAPKI ETSS 1260 

LDGENRRILI NTDIGLPNGL TFDPFSKLLC WADAGTKKLE CTLPDGTGRR V I QNNLKYP F 132 0 
SIVSYADHFY HTDWRRDGW SVNKHSGQFT DEYLPEQRSH LYGITAVYPY CPTGRK 



ACH5 Protein sequence 

Gene name: SNL (singed-like ; sea urchin fascin homolog-like) 

Unigene number: Hs . 118400 

Probeset Accession #: U03057 

Protein Accession #: NP_003079 

Signal sequence: none identified 

Transmembrane domain: none identified 

PFAM domains: none identified 
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Summary: a cytoplasmic, act in- bundling protein that is likely to be involved in 
the assembly of actin filament bundles present in microspikes, membrane ruffles, 
and stress fibers 

MTANGTAEAV QIQFGLINCG NKYLTAEAFG FKVNASASSL KKKQIWTLEQ PPDEAGSAAV 60 
CLRSHLGRYL AADKDGNVTC EREVPGPDCR FLIVAHDDGR WSLQSEAHRR YFGGTEDRLS 120 
CFAQTVSPAE KWSVHIAMHP QVNIYSVTRK RYAHLSARPA DEIAVDRDVP WGVDSLITLA 180 
FQDQRYSVQT ADHRFLRHDG RLVARPEPAT GYTLEFRSGK VAFRDCEGRY LAPSGPSGTL 240 
KAGKATKVGK DELFALEQSC AQWLQAANE RNVSTRQGMD LSANQDEETD QETFQLEIDR 300 
DTKKCAFRTH TGKYWTLTAT GGVQSTASSK NASCYFDIEW RDRRITLRAS NGKFVTSKKN 360 
GQLAASVETA GDSELFLMKL INRPIIVFRG EHGFIGCRKV TGTLDANRSS YDVFQLEFND 420 
GAYNI KDSTG KYWTVGSDSA VTSSGDTPVD FFFEFCDYNK VAIKVGGRYL KGDHAGVLKA 4 80 
SAETVDPASL WEY 



ACH6 Protein sequence 

Gene name: endothelial protein C receptor (EPCR; PROCR) 

Unigene number: Hs.82353 

Probeset Accession #: L3 5545 

Protein Accession #: NP_006395 

Signal sequence: predicted 1-17 

Transmembrane domain: predicted 211-227 

PFAM domains: none identified 

Summary: a Type la membrane protein, EPCR likely binds to [thrombin] -activated 
Protein C, a vitamin K-dependent serine protease zymogen necessary for blood 
coagulation. 

MLTTLLPILL LSGWAFCSQD ASDGLQRLHM LQISYFRDPY HVWYQGNASL GGHLTHVLEG 60 
PDTNTTIIQL QPLQEPESWA RTQSGLQSYL LQFHGLVRLV HQERTLAFPL TIRCFLGCEL ( 120 
PPEGSRAHVF FE VAVNGS S F VSFRPERALW QADTQVTSGV VTFTLQQLNA YNRTRYELRE 180 
FLEDTCVQYV QKHISAENTK GSQTSRSYTS LVLGVLVGGF IIAGVAVGIF LCTGGRRC 



ACH8 Protein sequence 

Gene name: melanoma adhesion molecule (MCAM; MUC18) 

Unigene number: Hs. 2 115 7 9 

Probeset Accession #: D51069 

Protein Accession #: NP_006491 

Signal sequence: predicted 1-17 

Transmembrane domain: predicted 559-575 

PFAM domains: immunoglobulin_domains predicted 264-324, and 356-410. 
Summary: a Type la membrane protein, associated with tumor progression and the 
development of metastasis in human malignant melanoma, and may play a role in 
neural crest cells during embryonic development. 

MGLPRLVCAF LLAACCCCPR VAGVPGEAEQ PAPELVEVEV GSTALLKCGL SQSQGNLSHV 60 
DWFSVHKEKR TLIFRVRQGQ GQSEPGEYEQ RLSLQDRGAT LALTQVTPQD ERIFLCQGKR 120 
PRSQEYRIQL RVYKAPEEPN IQVNPLGIPV NSKEPEEVAT CVGRNGYPIP QVIWYKNGRP 18 0 
LKEEKNRVHI QSSQTVESSG LYTLQSILKA QLVKEDKDAQ FYCELNYRLP SGNHMKESRE 24 0 
VTVPVFYPTE KVWLEVEPVG MLKEGDRVEI RCLADGNPPP HFSISKQNPS TREAEEETTN 300 
DNGVLVLE PA RKEHSGRYEC QAWNLDTMIS LLSEPQELLV NYVSDVRVSP AAPERQEGSS 360 
LTLTCEAESS QDLEFQWLRE ETDQVLERGP VLQLHDLKRE AGGGYRCVAS VPSIPGLNRT 420 
QLVKLAI FGP PWMAFKERKV WVKENMVLNL SCEASGHPRP TISWNVNGTA SEQDQDPQRV 480 
LSTLNVLVTP ELLETGVECT ASNDLGKNTS ILFLELVNLT TLTPDSNTTT GLSTSTASPH 54 0 
TRANSTSTER KLPEPESRGV VIVAVIVCIL VLAVLGAVLY FLYKKGKLPC RRSGKQEITL 600 
PPSRKTELW EVKSDKLPEE MGLLQGSSGD KRAPGDQGEK YIDLRH 



ACH9 Protein sequence 

Gene name: endothelin-1 (EDN1) 

Unigene number: Hs.2271 

Probeset Accession #: J05008 

Protein Accession #: NP_001946 

Signal sequence: predicted 1-17 

Transmembrane domain: none predicted 

PFAM domains: Endothelin domains predicted 59-73, and 108-129. 
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Summary: a secreted zymogen; the active protein is likely a 26-amino acid peptide 
with potent mammalian vasoconstrictor activity; it is necessary for normal vessel 
development . 

MDYLLMIFSL LFVACQGAPE TAVLGAELSA VGENGGEKPT PSPPWRL.RRS KRCSCSSLMD 60 
KECVYFCHLD IIWVNTPEHV VPYGLGSPRS KRALENLLPT KATDRENRCQ CASQKDKKCW 120 
NFCQAGKELR AEDIMEKDWN NHKKGKDCSK LGKKCIYQQL VRGRKIRRSS EEHLRQTRSE 180 
TMRNSVKSSF HDPKLKGKPS RERYVTHNRA HW 



ACJ1 Protein sequence 

Gene name: BMX non-receptor tyrosine kinase 
Unigene number: Hs.273 72 
Probeset Accession #: X83107 
Protein Accession #: NP_001712 
Signal sequence: none identified 
Transmembrane domain: none identified 

PFAM domains: plektrinJiomology_domain predicted 6-111; SH2_domain predicted 294 
383; protein_kinase_domain predicted 417-663 

Summary: a cytoplasmic protein, it likely plays a role in the growth and^ 
differentiation of hematopoietic cells; it is known to also be expressed in 
endothelial cells. 

MDTKSILEEL LLKRSQQKKK MSPNNYKERL FVLTKTNLSY YEYDKMKRGS RKGSIEIKKI 60 
RCVEKVNLEE QTPVERQYPF QIVYKDGLLY VYASNEESRS QWLKALQKEI RGNPHLLVKY 120 
HSGFFVDGKF LCCQQSCKAA PGCTLWEAYA NLHTAVNEEK HRVPTFPDRV LKIPRAVPVL 18 0 
KMDAPSSSTT LAQYDNESKK NYGSQPPSSS TSLAQYDSNS KKIYGSQPNF NMQYIPREDF 240 
PDWWQVRKLK SSSSSEDVAS SNQKERNVNH TTSKISWEFP ESSSSEEEEN LDDYDWFAGN 300 
ISRSQSEQLL RQKGKEGAFM VRNSSQVGMY TVSLFSKAVN DKKGTVKHYH VHTNAENKLY 360 
LAENYCFDSI PKLIHYHQHN SAGMITRLRH PVSTKANKVP DSVSLGNGIW ELKREEITLL 420 
KELGSGQFGV VQLGKWKGQY DVAVKMI KEG SMSEDEFFQE AQTMMKLSHP KLVKFYGVCS 480 
KEYPIYIVTE YISNGCLLNY LRSHGKGLEP SQLLEMCYDV CEGMAFLESH QFIHRDLAAR 54 0 
NCLVDRDLCV KVSDFGMTRY VLDDQYVSSV GTKFPVKWSA PEVFHYFKYS SKSDVWAFGI 600 
LMWEVFSLGK QPYDLYDNSQ WLKVSQGHR LYRPHLASDT IYQIMYSCWH ELPEKRPTFQ 660 
QLLSSIEPLR EKDKH 



ACJ4 Protein sequence 

Gene name: prostaglandin G/H synthase 2 (COX-2; PGHS-2) 

Unigene number: Hs. 1963 84 

Probeset Accession #: D28235 

Protein Accession #: NP_000954 

Signal sequence: predicted 1-17 

Transmembrane domain: none identified 

PFAM domains: EGF-like_domain predicted 18-55. 

Summary: a microsomal enzyme; COX-2 is the therapeutic target of the nonsteroidal 
anti- inflammatory drugs (NSAIDs) , such as aspirin. 

MLARALLLCA VLALSHTANP CCSHPCQNRG VCMSVGFDQY KCDCTRTGFY GENCSTPEFL 60 
TRIKLFLKPT PNTVHY I LTH FKGFWNWNN IPFLRNAIMS YVLTSRSHLI DSPPTYNADY 120 
GYKSWEAFSN LSYYTRALPP VPDDCPTPLG VKGKKQLPDS NEIVEKLLLR RKFIPDPQGS 180 
NMMFAFFAQH FTHQFFKTDH KRGPAFTNGL GHGVDLNHIY GETLARQRKL RLFKDGKMKY 24 0 
QIIDGEMYPP TVKDTQAEMI YPPQVPEHLR FAVGQEVFGL VPGLMMYATI WLREHNRVCD 300 
VLKQEHPEWG DEQLFQTSRL ILIGETIKIV IEDYVQHLSG YHFKLKFDPE LLFNKQFQYQ 360 
NRIAAEFNTL YHWHPLLPDT FQIHDQKYNY QQFIYNNSIL LEHGITQFVE SFTRQIAGRV 420 
AGGRNVPPAV QKVSQASIDQ SRQMKYQS FN EYRKRFMLKP YESFEELTGE KEMSAELEAL 480 
YGD IDAVELY PALLVEKPRP DAI FGETMVE VGAPFSLKGL MGNVICSPAY WKPSTFGGEV 540 
GFQIINTASI QSLICNNVKG CPFTSFSVPD PELIKTVTIN ASSSRSGLDD INPTVLLKER 600 
STEL 



AC J 6 Protein sequence 
Gene name: SEC14-like-l 
Unigene number: Hs.75232 
Probeset Accession #: D67029 
Protein Accession #: NP_002994 
Signal sequence: none identified 
Transmembrane domain: none identified 



PFAM domains: none identified 
Summary: a cytoplasmic protein 

MVQKYQSPVR VYKYPFELIM AAYERRFPTC PLIPMFVGSD TVSEFKSEDG AIHVIERRCK 60 

LDVDAPRLLK KIAGVDYVYF VQKNSLNSRE RTLHIEAYNE TFSNRVIINE HCCYTVHPEN 12 0 

EDWTCFEQSA SLDIKSFFGF ESTVEKIAMK QYTSNIKKGK EIIEYYLRQL EEEGITFVPR 180 

WSPPSITPSS ETSSSSSKKQ AASMAWIPE AALKEGLSGD ALSSPSAPEP WGTPDDKLD 24 0 

ADHIKRYLGD LTPLQESCLI RLRQWLQETH KGKIPKDEHI LRFLRARDFN IDKAREIMCQ 300 

SLTWRKQHQV DYILETWTPP QVLQDYYAGG WHHHD KDGRP LYVLRLGQMD TKGLVRALGE 360 

EALLRYVLSV NEERLRRCEE NTKVFGRPIS SWTCLVDLEG LNMRHLWRPG VKALLRIIEV 420 

VEANYPETLG RLLILRAPRV FPVLWTLVSP FIDDNTRRKF LIYAGNDYQG PGGLLDYIDK 480 

EIIPDFLSGE CMCEVPEGGL VPKSLYRTAE ELENEDLKLW TETIYQSASV FKGAPHEILI 540 

QIVDASSVIT WD FDVCKGD I VFNIYHSKRS PQPPKKDSLG AHSITSPGGN NVQLIDKVWQ 600 

LGRDYSMVES PLICKEGESV QGSHVTRWPG FYILQWKFHS MPACAASSLP RVDDVLASLQ 660 
VSSHKCKVMY YTEVIGSEDF RGSMTSLESS HSGFSQLSAA TTSSSQSHSS SMISR 



ACJ8 Protein sequence 

Gene name: intercellular adhesion molecule 1 (ICAM1; CD54) 

Unigene number: Hs . 168 3 83 

Probeset Accession #: M24283 

Protein Accession #: NP_000192 

Signal sequence: predicted 1-27 

Transmembrane domain: predicted 4 81-4 97 

PFAM domains: immunoglobulin_domains predicted 128-188, and 325-373. 
Summary: a Type la membrane protein; ICAM1 is typically expressed on endothelial 
cells and cells of the immune system; ICAM1 binds to integrins of type CDlla/CD18, 
or CDllb/CD18; ICAM1 is also exploited by Rhinovirus as a receptor. 

MAPSSPRPAL PALLVLLGAL FPGPGNAQTS VSPSKVILPR GGSVLVTCST SCDQPKLLGI 60 

ETPLPKKELL LPGNNRKVYE LSNVQEDSQP MCYSNCPDGQ STAKTFLTVY WTPERVELAP 120 

LPSWQPVGKN LTLRCQVEGG APRANLTWL LRGEKELKRE PAVGEPAEVT TTVLVRRDHH 180 

GANFSCRTEL DLRPQGLELF ENTSAPYQLQ TFVLPATPPQ LVSPRVLEVD TQGTWCSLD 240 

GLFPVSEAQV HLALGDQRLN PTVTYGNDSF SAKASVSVTA EDEGTQRLTC AVILGNQSQE 300 

TLQTVTIYSF PAPNVILTKP EVSEGTEVTV KCEAHPRAKV TLNGVPAQPL GPRAQLLLKA 3 60 

TPEDNGRSFS CSATLEVAGQ LIHKNQTREL RVLYGPRLDE RDCPGNWTWP ENSQQTPMCQ 420 

AWGNPLPELK CLKDGTFPLP IGESVTVTRD LEGTYLCRAR STQGEVTREV TVNVLSPRYE 480 
IVIITWAAA VIMGTAGLST YLYNRQRKIK KYRLQQAQKG TPMKPNTQAT PP 



ACK3 Protein sequence 

Gene name: angiopoietin 1 receptor (TIE-2; TEK) 
Unigene number: Hs.89640 
Probeset Accession #: L06139 
Protein Accession #: NP_000450 
Signal sequence: predicted 1-18 
Transmembrane domain: predicted 74 6-770 

PFAM domains: immunoglobulin_domains predicted 44-102, 370-424; EGF_like_domains 
predicted 210-252, 254-299, and 301-341; FN3_domains predicted 444-536, 541-634, 
and 638-732; protein_kinase_domain predicted 824-1096. 

Summary: a Type la membrane protein; it is expressed almost exclusively in 
endothelial cells in mice, rats, and humans; the ligand for this receptor is 
angiopoietin- 1; defects in TEK are associated with inherited venous malformations; 
the TEK signaling pathway appears to be critical for endothelial cell -smooth muscle 
cell communication in venous morphogenesis. 

MDSLASXiVLC GVSLLLSGTV EGAMDLI LIN SLPLVSDAET SLTCIASGWR PHEPITIGRD 60 
FEALMNQHQD PLEVTQDVTR EWAKKVWKR EKASKINGAY FCEGRVRGEA IRIRTMKMRQ 120 
QASFLPATLT ^TVDKGDNVN ISFKKVLIKE EDAVIYKNGS FIHSVPRHEV PDILEVHLPH 180 
AQPQDAGVYS ^iRYIGGNLFT SAFTRLIVRR CEAQKWGPEC NHLCTACMNN GVCHEDTGEC 240 
ICPPGFMGRT CEKACELHTF GRTCKERCSG QEGCKSYVFC LPDPYGCSCA TGWKGLQCNE 3 00 
ACHPGFYGPD CKLRCSCNNG EMCDRFQGCL CSPGWQGLQC EREGIPRMTP KIVDLPDHIE 3 60 
VNSGKFNPIC KASGWPLPTN EEMTLVKPDG TVLHPKDFNH TDHFSVAI FT IHRILPPDSG 420 
VWVCSVNTVA GMVEKPFNIS VKVLPKPLNA PNVIDTGHNF AVINISSEPY FGDGPIKSKK 480 
LLYKPVNHYE AWQHIQVTNE IVTLNYLEPR TEYELCVQLV RRGEGGEGHP GPVRRFTTAS 54 0 
IGLPPPRGLN LLPKSQTTLN LTWQPIFPSS EDDFYVEVER RSVQKSDQQN IKVPGNLTSV 600 
LLNNLHPREQ YWRARVNTK AQGEWSEDLT AWTLSDILPP QPENIKISNI THSSAVISWT 660 
ILDGYSISSI TIRYKVQGKN EDQHVDVKIK NATIIQYQLK GLEPETAYQV DIFAENNIGS 720 
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SNPAFSHELV TLPESQAPAD LGGGKMLLIA ILGSAGMTCL TVLLAFLIIL QLKRANVQRR 780 

MAQAFQNVRE EPAVQFNSGT LALNRKVKNN PDPTIYPVLD WNDIKFQDVI GEGNFGQVLK 840 

ARIKKDGLRM DAAIKRMKEY ASKDDHRDFA GELEVLCKLG HHPNIINLLG ACEHRGYLYL 900 

AIEYAPHGNL LDFLRKSRVL ETDPAFAIAN STASTLSSQQ LLHFAADVAR GMDYLSQKQF 960 

IHRDLAARNI LVGENYVAKI ADFGLSRGQE VYVKKTMGRL PVRWMAIESL NYSVYTTNSD 1020 

VWSYGVLLWE IVSLGGTPYC GMTCAELYEK LPQGYRLEKP LNCDDEVYDL MRQCWREKPY 1080 
ERPSFAQILV SLNRMLEERK TYVNTTLYEK FTYAGIDCSA EEAA 



PZA6 Protein sequence 

Gene name: prostate differentiation factor (PLAB; MIC-1) 

Unigene number: Hs. 116577 

Probeset Accession #: AB000584 

Protein Accession #: NP_004855 

Signal sequence: predicted 1-29 

Transmembrane domain: none identified 

PFAM domains: TGF-beta _domain predicted 211-308. 

Summary: a secreted protein; its exact function is unclear; it inhibits 
proliferation of primitive hematopoietic progenitors; it inhibits activation of 
macrophages; it is highly expressed in placenta and in serum of pregnant women; it 
may promote fetal survival by suppressing the production of maternally- derived 
proinflammatory cytokines within the uterus. 

MPGQELRTVN GSQMLLVLLV LSWLPHGGAL SLAEASRASF PGPSELHSED SRFRELRKRY 60 
EDLLTRLRAN QSWEDSNTDL VPAPAVRILT PEVRLGSGGH LHLRISRAAL PEGLPEASRL 120 
HRALFRLiSPT ASRSWDVTRP LRRQLSLARP QAPALHLRLS PPPSQSDQLL AESSSARPQL 18 0 
ELHLRPQAAR GRRRARARNG DDCPLGPGRC CRLHTVRASL EDLGWADWVL SPREVQVTMC 24 0 
IGACPSQFRA ANMHAQIKTS LHRLKPDTEP APCCVPASYN PMVLIQKTDT GVSLQTYDDL 300 
LAKDCHCI 



AAD2 Protein sequence : 
Gene name: Thrombospondin-1 
Unigene number: Hs. 87409 
Probeset Accession #: AA232645 
Protein Accession #: NP_003237.1 

Signal sequence: predicted 1-18 (first underlined sequence) 
Transmembrane Domain: none identified 

Summary: Thrombospondin is a large modular glycoprotein component of the 
extracellular matrix and contains a variety of distinct domains, including three 
repeating subunits (types I, II, and III) that share homology to an assortment of 
other proteins . 

MGLAWGLGVL FLMHVCGT NR IPESGGDNSV FDIFELTGAA RKGSGRRLVK GPDPSSPAFR 60 

IEDANLIPPV PDDKFQDLVD AVRAEKGFLL LASLRQMKKT RGTLIALERK DHSGQVFSW 120 

SNGKAGTLDL SLTVQGKQHV VSVEEALLAT GQWKSITLFV QEDRAQLYID CEKMENAELD 180 

VPIQSVFTRD LASIARLRIA KGGVNDNFQG VLQNVRFVFG TTPEDILRNK GCSSSTSVLL 24 0 

TLDNNWNGS SPAIRTNYIG HKXKDLQAIC GISCDELSSM VLELRGLRTI VTTLQDSIRK 300 

VTEENKELAN ELRRPPLCYH NGVQYRNNEE WTVDSCTECH CQNSVTICKK VSCPIMPCSN 360 

ATVPDGECCP RCWPSDSADD GWSPWSEWTS CSTSCGNGIQ QRGRSCDSLN NRCEGSSVQT 420 

RTCHIQECDK RFKQDGGWSH WSPWSSCSVT CGDGVITRIR LCNSPSPQMN GKPCEGEARE 4 80 

TKACKKDACP INGGWGPWSP WDICSVTCGG GVQKRSRLCN NPAPQFGGKD CVGDVTENQ I 540 

CNKQDCPIDG CLSNPCFAGV KCTSYPDGSW KCGACPPGYS GNGIQCTDVD ECKEVPDACF 600 

NHNGEHRCEN TDPGYNCLPC PPRFTGSQPF GQGVEHATAN KQVCKPRNPC TDGTHDCNKN 660 

AKCNYLGHYS DPMYRCECKP GYAGNGIICG EDTDLDGWPN ENL VCV AN AT YHCKKDNCPN 720 

LPNSGQEDYD KDGIGDACDD DDDNDKIPDD RDNCPFHYNP AQYDYDRDDV GDRCDNCPYN 780 

HNPDQADTDN NGEGDACAAD I DGDGI LNER DNCQYVYNVD QRDTDMDGVG DQCDNCPLEH 840 

NPDQLDSDSD RIGDTCDNNQ D I DEDGHQNN LDNCPYVPNA NQADHDKDGK GDACDHDDDN 900 

DGIPDDKDNC RLVPNPDQKD SDGDGRGDAC rDDFDHDSVP DIDDICPENV DISETDFRRF 960 

QMIPLDPKGT SQNDPNWWR HQGKELVQTV tVCDPGLAVGY DEFNAVDFSG TFFINTERDD 1020 

DYAGFVFGYQ SSSRFYWMW KQVTQSYWDT fcPTRAQGYSG LSVKWNSTT GPGEHLRNAL 1080 

WHTGNTPGQV RTLWHDPRHI GWKDFTAYRW RLSHRPKTGF IRWMYEGKK IMADSGPIYD 1140 
KTYAGGRLGL FVFSQEMVFF SDLKYECRDP 



AAD9 protein sequence 

Gene name: LIM homeobox protein cof actor (CLIM-1) 
Unigene number: Hs.4 980 



Probeset Accession #: F13782 
Protein Accession #: AAC83552 
Pfam: LIM bind 

Transmembrane Domain: none identifed 

Summary: The LIM homeodomain (LIM-HD) proteins, which contain two tandem LIM 
domains followed by a homeodomain, are critical transcriptional regulators of 
embryonic development. The LIM domain is a conserved cysteine-rich zinc-binding 
motif found in LIM-HD proteins, cytoskeletal components, LIM kinases, and other 
proteins. LIM domains are protein-protein interaction motifs, can inhibit binding 
of LIM-HD proteins to DNA, and can negatively regulate LIM-HD protein function. 

MSSTPHDPFY SSPFGPFYRR HTPYMVQPEY RIYEMNKRLQ SRTEDSDNLW WDAFATEFFE 60 

DDATLTLSFC LEDGPKRYTI GRTLIPRYFS TVFEGGVTDL YYILKHSKES YHNSSITVDC 120 

DQCTMVTQHG KPMFTKVCTE GRLILEFTFD DLMRI KTWHF TIRQYRELVP RSILAMHAQD 180 

PQVLDQLSKN ITRMGLTNFT LNYLRLCVIL EPMQELMSRH KTYNLSPRDC LKTCLFQKWQ 240 

RMVAPPAEPT RQPTTKRRKR KNSTSSTSNS SAGNNANSTG SKKKTTAANL SLSSQVPDVM 3 00 

WGEPTLMGG EFGDEDERLI TRLENTQYDA ANGMDDEEDF NNSPALGNNS PWNSKPPATQ 360 
ETKSENPPPQ ASQ 



AAE1 protein sequence 

Gene name: guanine nucleotide binding protein 11 
Unigene number: Hs. 83381 
Probeset Accession #: U313 84 
Protein Accession #: NP_004117.1 

Pfam: G-gamma; CAAX motif (f arnesylation site) prediction underlined 
Summary: The G gamma proteins are a component of the trimeric G-proteins that 
interact with cell surface receptors. The G protein beta and gamma subunits 
directly regulate the activities of various enzymes and ion channels after receptor 
ligation. Unlike most of the other known gamma subunits, gamma 11 is modified by a 
farnesyl group and is not capable of interacting with beta 2. 

MPALHIEDLP EKEKLKMEVE QLRKEVKLQR QQVSKCSEEI KNYIEERSGE DPLVKGI PED 60 
KNPFKEKGSC_VIS 



AAE2 protein sequence 

Gene name: Transcription factor 4 (Immunoglobulin transcription factor 2) (ITF-2) 
(SL3-3 Enhancer factor 2) (SEF-2) 
Unigene number: Hs. 2 8 90 6 8 
Probeset Accession #: M74719 
Protein Accession #: NP_003 190.1 
Pfam: HLH domain prediction underlined 

Summary: Transcription factor 4 is a helix- loop-helix (HLH) protein which belongs 
to a family of nuclear proteins, designated SL3-3 enhancer factors 2 (SEF2) , that 
interact with an Ephrussi box-like motif within the glucocorticoid response element 
in the enhancer of the murine leukemia virus SL3-3. Various cell types display 
differences both in the sets of SEF2-DNA complexes formed and in their amounts. 
Molecular analysis of cDNA clones show the existence of multiple related mRNA 
species containing alternative coding regions, which are most probably a result of 
differential splicing. 



MHHQQRMAAL GTDKELSDLL DFSAMFSPPV SSGKNGPTSL ASGHFTGSNV EDRSSSGSWG 60 

NGGHPSPSRN YGDGTPYDHM TSRDLGSHDN LSPPFVNSRI QSKTERGSYS SYGRESNLQG 120 

CHQQSLLGGD MDMGNPGTLS PTKPGSQYYQ YSSNNPRRRP LHSSAMEVQT KKVRKVPPGL 180 

PSSVYAPSAS TADYNRDSPG YPSSKPATST FPSSFFMQDG HHSSDPWSSS SGMNQPGYAG 240 

MLGNSSHIPQ SSSYCSLHPH ERLSYPSHSS ADINSSLPPM STFHRSGTNH YSTSSCTPPA 3 00 

NGTDSIMANR GSGAAGSSQT GDALGKALAS IYSPDHTNNS FSSNPSTPVG S^SLSAGTA 3 60 

VWSRNGGQAS SSPNYEGPLH SLQSRIEDRL ERLDDAIHVL RNHAVGPSTA M X5HGDMHG 420 

IIGPSHNGAM GGLGSGYGTG LLSANRHSLM VGTHREDGVA LRGSHSLLPN QVPVPQLPVQ 48 0 

SATSPDLNPP QDPYRGMPPG LQGQSVSSGS SEIKSDD EGD ENLODTKSSE DKKLDDDKKD 540 

IKSITSNNDD EDLTPEOKAE REKER RMANN ARERLRVRDI NEAFKELGRM VQLHLKSDKP 600 

QTKLLILHQA VAVILSLEQQ VRERNLNPKA ACLKRREEEK VSSEPPPLSL AG PHP GMGD A 660 
SNHMGQM 



AAE4 protein sequence 
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Gene name: phosphatidylcholine 2 -acylhydrolase 

Unigene number: Hs. 2 11 5 87 

Probeset Accession #: M68874 

Protein Accession #: AAA60105.1 

Pfam: PLA2 B, C2 domain prediction underlined 

Summary: Phospholipases A2 (PIA2S) play a key role in inflammatory processes 
through production of precursors of eicosanoids and platelet-activating factor. 
PLA2 is a 100 kd protein that contains a structural element homologous to the C2 
region of protein kinase C. 

MSFIDPYQHI IVEHOYSH KF TVWLRATKV TKGAFGDMLD TPDPYVELFI STTPDSRKRT 60 

RHFNNDINPV WNETFEFILD PNOENVLEIT LMDANYVMDE TLGTAT FTVS SMKVGEKKEV 120 

PFIFNQVTEM VLEMSLEVCS CPDLRFSMAL CDQEKTFRQQ RKEHIRESMK KLLGPKNSEG 180 

LHSARDVPW AILGSGGGFR AMVGFSGVMK ALYESGILDC ATYVAGLSGS TWYMSTLYSH 240 

PDFPEKGPEE I NEELMKNVS HNPLLLLTPQ KVKRYVESLW KKKSSGQPVT FTDIFGMLIG 300 

ETLIHNRMNT TLSSLKEKVN TAQCPLPLFT CLHVKPDVSE LMFADWVEFS PYEIGMAKYG 360 

TFMAPDLFGS KFFMGTWKK YEENPLHFLM GVWGSAFSIL FNRVLGVSGS QSRGSTMEEE 420 

LENITTKHIV SNDSSDSDDE SHEPKGTENE DAGSDYQSDN QASWIHRMIM ALVSDSALFN 480 

TREGRAGKVH NFMLGLNLNT SYPLSPLSDF ATQDSFDDDE LDAAVADPDE FERIYEPLDV 540 

KSKKIHWDS GLTFNLPYPL ILRPQRGVDL IISFDFSARP SDSSPPFKEL LLAEKWAKMN 600 

KLPFPKIDPY VFDREGLKEC YVFKPKNPDM EKDCPTIIHF VLANINFRKY KAPGVPRETE 660 

EEKEIADFDI FDDPESPFST FNFQYPNQAF KRLHDLMHFN TLNNIDVIKE AMVESIEYRR 72 0 
QNPSRCSVSL SNVEARRFFN KEFLSKPKA 



ACA1 protein sequence 

Gene name: tissue factor pathway inhibitor 2 TFPI2, placental protein 5 (PP5) 
Unigene number : Hs . 7 8 0 4 5 

Probeset Accession #: D29992 » 

Protein Accession #: BAA06272.1 

Pfam: Kunitz BPTI 

Signal sequence: underlined 

Summary: ACA1 is a serine proteinase inhibitor that was originally purified from 
conditioned medium of the human glioblastoma cell line T98G. ACA1 is identical to 
placental protein 5 (PP5) and TFPI2, a placenta-derived glycoprotein with serine 
proteinase inhibitor activity. PP5 belongs to the Kunitz-type serine proteinase 
inhibitor family, having three putative Kunitz-type inhibitor domains. 

MDPARPLGLS ILLLFLTEAA LG DAAOEPTG NNAEICLLPL DYGPCRALLL RYYYDRYTQS 60 

CRQFLYGGCE GNANNFYTWE ACDDACWRIE KVPKVCRLQV SVDDQCEGST EKYFFNLSSM 120 

TCEKFFSGGC HRNRI ENRFP DEATCMGFCA PKKIPSFCYS PKDEGLCSAN VTRYYFNPRY 180 
RTCDAFTYTG CGGNDNNFVS REDCKRACAK ALKKKKKMPK LRFASRIRKI RKKQF 



ACB8 protein sequence 
Gene name: myosin X 
Unigene number: Hs. 6163 8 
Probeset Accession #: N77151 
Protein Accession #: NP_036466 t 

Pfam: myosin head, IQ (calmodulin binding motif) , PH, MyTH4 

Summary: Myosins are molecular motors that move along filamentous act in. Seven 
classes of myosin are expressed in vertebrates: conventional myosin, or myosin-II, 
as well as the 6 unconventional myosin classes- I, -V, -VI, -VII, -IX, and -X. 

MDNFFTEGTR VWLRENGQHF PSTVNSCAEG IWFRTDYGQ VFTYKQSTIT HQKVTAMHPT 60 

NEEGVDDMAS LTELHGGSIM YNLFQRYKRN QIYTYIGSIL ASVNPYQPIA GLYEPATMEQ 120 

YSRRHLGELP PHIFAIANEC YRCLWKRYDN QCILISGESG AGKTESTKLI LKFLSVISQQ 180 

SLELSLKEKT SCVERAILES SPIMEAFGNA KTVYNNNSSR FGKFVQLNIC QKGNIQGGRI 24 0 

VDYLLEKNRV VRQNPGERNY HIFYALLAGL EHEEREEFYL STPENYHYLN QS GCVEDKT I 300 

SDQESFREVI TAMDVMQFSK EEVREVSRLL AGILHLGNIE FITAGGAQVS FKTALGRSAE 360 

LLGLDPTQLT DALTQRSMFL RGEEILTPLN VQQAVDSRDS LAMALYACCF EWVIKKINSR 420 

IKGNEDFKSI GILDIFGFEN FEVNHFEQFN INYANEKLQE YFNKHIFSLE QLEYSREGLV 4 80 

WEDIDWIDNG ECLDLIEKKL GLLALINEES HFPQATDSTL LEKLHSQHAN NHFYVKPRVA 540 

VNNFGVKHYA GEVQYDVRGI LEKNRDTFR0 DLLNLLRESR FDFIYDLFEH VSSRNNQDTL 6 00 

KCGSKHRRPT VSSQFKDSLH SLMATLSSSN PFFVRCIKPN MQKMPDQFDQ AWLNQLRYS 660 

GMLETVRIRK AGYAVRRPFQ DFYKRYKVLM RNLAIiPEDVR GKCTSLLQLY DASNSEWQLG 72 0 

KTKVFLRESL EQKLEKRREE EVSHAAMVIR AHVLGFLARK QYRKVLYCW IIQKNYRAFL 780 

LRRRFLHLKK AAIVFQKQLR GQIARRVYRQ LLAEKREQEE KKKQEEEEKK KREEEERERE 840 
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RERREAELRA QQEEETRKQQ ELEALQKSQK EAELTRELEK QKENKQVEEI LRLEKEIEDL 900 

QRMKEQQELS LTEASLQKLQ ERRDQELRRL EEEACRAAQE FLESLNFDEI DECVRNIERS 960 

LSVGSEFSSE LAESACEEKP NFNFSQPYPE EEVDEGFEAD DDAFKDSPNP SEHGHSDQRT 1020 

SGIRTSDDSS EEDPYMNDTV VPTSPSADST VLLAPSVQDS GSLHNSSSGE STYCMPQNAG 1080 

DLPSPDGDYD YDQDDYEDGA ITSGSSVTFS NSYGSQWSPD YRCSVGTYNS SGAYRFSSEG 114 0 

AQSSFEDSEE DFDSRFDTDD ELSYRRDSVY SCVTLPYFHS FLYMKGGLMN SWKRRWCVLK 1200 

DETFLWFRSK QEALKQGWLH KKGGGSSTLS RRNWKKRWFV LRQSKLMYFE NDSEEKLKGT 1260 

VEVRTAKEII DNTTKENGID IIMADRTFHL IAESPEDASQ WFSVLSQVHA STDQEIQEMH 1320 

DEQANPQNAV GTLDVGLIDS VCASDSPDRP NSFVI ITANR VLHCNADTPE EMHHWITLLQ 13 80 

RSKGDTRVEG QEFIVRGWLH KEVKNSPKMS SLKLKKRWFV LTHNSLDYYK SSEKNALKLG 1440 

TLVLNSLCSV VPPDEKIFKE TGYWNVTVYG RKHCYRLYTK LLNEATRWSS AIQNVTDTKA 1500 

PIDTPTQQLI QDIKENCLNS DWEQIYKRN PILRYTHHPL HSPLLPLPYG DINLNLLKDK 1560 

GYTTLQDEAI KIFNSLQQLE SMSDPIPIIQ GILQTGHDLR PLRDELYCQL IKQTNKVPHP 1620 

GSVGNLYSWQ ILTCLSCTFL PSRGILKYLK FHLKRIREQF PGTEMEKYAL FTYESLKKTK 1680 

CREFVPSRDE IEALIHRQEM TSTVYCHGGG SCKITINSHT TAGEWEKLI RGLAMEDSRN 1740 

MFALFEYNGH VDKAIESRTV VADVLAKFEK LAATSEVGDL PWKFYFKLYC FLDTDNVPKD 1800 

SVEFAFMFEQ AHEAVIHGHH PAPEENLQVL AALRLQYLQG DYTLHAAIPP LEEVYSLQRL 1860 

KARISQSTKT FTPCERLEKR RTSFLEGTLR RSFRTGSWR QKVEEEQMLD MWIKEEVSSA 1920 

RASIIDKWRK FQGMNQEQAM AKYMALIKEW PGYGSTLFDV ECKEGGFPQE LWLGVSADAV 1980 

SVYKRGEGRP LEVFQYEHIL SFGAPLANTY KIWDERELL FETSEWDVA KLMKAYISMI 2040 
VKKRYSTTRS ASSQGSSR 



ACC3 protein sequence 

Gene name: calcitonin receptor-like (CALCRL) 
Unigene number: Hs. 152175 
Probeset Accession #: L763 80 
Protein Accession #: NP_005786.1 

Pfam: 7TM 2 (7 transmembrane receptor (Secretin family)) , 
Transmembrane domains: predictions underlined 
Signal sequence: first underlined region 

Summary: Calcitonin gene-related peptide '(CGRP) is a neuropeptide with diverse 
biological effects including potent vasodilator activity. The human CGRP1 receptor 
shares significant peptide sequence homology with the human calcitonin receptor, a 
member of the G-protein-coupled receptor superfamily. Stable expression in 293 
(HEK 293) cells produces specific, high affinity binding sites for CGRP. Exposure 
of these cells to CGRP results in a 60-fold increase in cAMP production. 

MEKKCTLYFL VLLPFFMILV TAE LEESPED SIQLGVTRNK IMTAQYECYQ KIMQDPIQQA 60 

EGVYCNRTWD GWLCWNDVAA GTESMQLCPD YFQDFDPSEK VTKICDQDGN WFRHPASNRT 120 

WTNYTQCNVN THE KVKT ALN L FYLTIIGHG LSIASLLISL GIFFY FKSLS CQRITLHKNL 180 

F FSFVCNSW TIIHLTAVAN NOALV ATNPV SCKVSQFIHL YLMGCN YFWM LCEGIYLHTL 240 

IWAVFAEKO HLM W YY F LGW GFPLIPACIH AIARSLY YND NCWISSDTHL LYIIHGPICA 300 

ALLVNLFFLL NIVRVLITK L KVTHQAESNL YMKAVRATLI LVPLLGIEFV LIPWR PEGKI 360 

AEEVYDY IMH ILMHFOGLLV STIFCFFNGE VQAILRRNWN QYKIQFGNSF SNSEALRSAS 420 
YTVSTISDGP GYSHDCPSEH LNGKSIHDIE NVLLKPENLY N 



ACC5 protein sequence 

Gene name: Select in E (endothelial adhesion molecule 1) 
Unigene number: Hs. 89546 
Probeset Accession #: M24736 
Protein Accession #: NP_000441.1 

Pfam: lectin c, EGF like domain, sushi (SCR domain) 
Signal sequence: first underlined region 
Transmembrane domain: second underlined region 

Summary: Focal adhesion of leukocytes to the blood vessel lining is a key step in 
inflammation and certain vascular disease processes. Endothelial leukocyte 
adhesion molecule-1 ' ELAM-1) , a cell surface glycoprotein expressed by cytokine- 
activated endothelial, mediates the adhesion of blood neutrophils. The primary 
sequence of ELAM-1 predicts an amino- terminal lect in-like domain, an EGF domain, 
and six tandem repetitive motifs (about 60 amino acids each) related to those found 
in complement regulatory proteins. A similar domain structure is also found in the 
MEL- 14 lymphocyte cell surface homing receptor, and in granule -membrane protein 
140, a membrane glycoprotein of platelet and endothelial secretory granules that 
can be rapidly mobilized (less than 5 minutes) to the cell surface by thrombin and 
other stimuli. Thus, ELAM-1 may be a member of a nascent gene family of cell 



158 



surface molecules involved in the regulation of inflammatory and immunological 
events at the interface of vessel wall and blood. 



MIASOFLSAL TLVLLIKESG AW SYNTSTEA MTYDEASAYC QQRYTHLVAI QNKEEIEYLN 60 

SILSYSPSYY WIGIRKVNNV WVWVGTQKPL TEEAKNWAPG EPNNRQKDED CVEIY1KREK 120 

DVGMWNDERC SKKKLALCYT AACTNTSCSG HGECVETINN YTCKCDPGFS GLKCEQIVNC 180 

TALES PEHGS LVCSHPLGNF SYNSSCSISC DRGYLPSSME TMQCMSSGEW SAPIPACNW 240 

ECDAVTNPAN GFVECFQNPG SFPWNTTCTF DCEEGFELMG AQSLQCTSSG NWDNEKPTCK 300 

AVTCRAVRQP QNGSVRCSHS PAGEFTFKSS CNFTCEEGFM LQGPAQVECT TQGQWTQQIP 360 

VCEAFQCTAL SNPERGYMNC LPSASGSFRY GSSCEFSCEQ GFVLKGSKRL QCGPTGEWDN 420 

EKPTCEAVRC DAVHQPPKGL VRCAHSPIGE FTYKSSCAFS CEEGFELYGS TQLECTSQGQ 4 80 

WTEEVPSCQV VKCSSLAVPG KINMSCSGEP VFGTVCKFAC PEGWTLNGSA ARTCGATGHW 540 

SGLLPTCEAP TES NI PLVAG LSAAGLSLLT LA PFLLWLRK CLRKAKKFVP ASSCQSLESD 600 
GSYQKPSYIL 



ACC8 protein sequence 

Gene name: Chemokine (C-X-C motif) , receptor 4 (fusin) 
Unigene number: Hs. 89414 
Probeset Accession #: L06797 
Protein Accession #: NP_003458.1 

Pfam: 7TM 1 (7 transmembrane receptor (rhodopsin family)) 
Signal sequence: none identified 
Transmembrane domains: predictions underlined 

Summary: The chemokine receptor CXCR4 (also designated fusin and LESTR) is a 
cof actor for fusion and entry of T cell -tropic strains of HIV-1. 

MEGISIYTSD NYTEEMGSGD YDSMKEPCFR EENANFNKI F LPTIYSIIFL TGIVGNGLVI 60 

LVMGYQKKLR SMTDK YRLHL SVADLLFVIT LP FWAVDAVA NWYFGNFLC K AVHVIYTVNL 12 0 

YSSVLILAFI SLD RYLAIVH ATNSQRPRKL LAE KWYVGV WIPALLLTIP DFIFAN VSEA • 180 

DDRYICDRFY PNDLWV WFO FOHIMVGLIL PGIVILSCYC IIISKLSHSK GHQKRKALKT 240 

TV1 LILAFFA CWLPYYIGIS IDSFIL LEII KQGCEFENTV HKWISITE AL AFFHCCLNPI 300 
LYAFLGAKFK TSAQHALTSV SRGSSLKILS KGKRGGHS S V STESESSSFH SS 



ACF2 protein sequence 

Gene name: Endothelial cell -specific molecule 1 
Unigene number: Hs.41716 
Probeset Accession #: X89426 
Protein Accession #: NP_008967.1 
Signal sequence: underlined 

Pfam: IGFBP (Insulin- like growth factor binding proteins) 

Summary: Human endothelial cell-specific molecule (called ESM-1) was cloned from a 
human umbilical vein endothelial cell (HUVEC) cDNA library. Constitutive ESM-1 
gene expression is seen in HUVECs but not in the other human cell lines. The cDNA 
sequence contains an open reading frame of 552 nucleotides and a 13 98 -nucleotide 
3 • -untranslated region including several domains involved in mRNA instability and 
five putative polyadenylation consensus sequences. The deduced 184-amino acid 
sequence defines a cysteine-rich protein with a functional NH2-terminal hydrophobic 
signal sequence. 

MKSVLLLTTL LVPAHLVAA W SNNYAVDCPQ HCDSSECKSS PRC KRTVLDD CGCCRVCAAG 60 
RGETCYRTVS GMDGMKCGPG LRCQPSNGED PFGEEFGICK DCPYGTFGMD CRETCNCQSG 120 
ICDRGTGKCL KFPFFQYSVT KSSNRFVSLT EHDMASGDGN IVREEWKEN AAGSPVMRKW 18 0 
LNPR 



ACF4 protein sequence 

Gene name: P53- responsive gene 2 similar to D .melanogaster peroxidasin (U11052) 
Unigene number: Hs. 118 8 93 t *l 
Probeset Accession #: D86983 
Protein Accession #: BAA13219 

Pfam: LRRNT (Leucine rich repeat N-terminal domain) , LRR (Leucine Rich Repeat) , 
LRRCT (Leucine rich repeat C- terminal domain) , Ig (immunoglobulin domain) , 
Peroxidase, WC (von Willebrand factor type C domain) 

Summary: ACF4 is a gene originally identified from KG-1 cell and brain cDNA 
libraries. 
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SRPWWLRASE RPSAPSAMAK RSRGPGRRCL LALVLFCAWG TLAWAQKPG AGCPSRCLCF 60 

RTTVRCMHLL LEAVPAVAPQ TSILDLRFNR IREIQPGAFR RLRNLNTLLL NNNQIKRIPS 120 

GAFEDLENLK YLYLYKNEIQ SIDRQAFKGL ASLEQLYLHF NQIETLDPDS FQHLPKLERL 180 

FLHNNRITHL VPGTFNHLES MKRLRLDSNT LHCDCEILWL ADLLKTYAES GNAQAAAICE 240 

YPRRIQGRSV ATITPEELNC ERPRITSEPQ DADVTSGNTV YFTCRAEGNP KPEIIWLRNN 300 

NELSMKTDSR LNLLDDGTLM IQNTQETDQG IYQCMAKNVA GEVKTQEVTL RYFGSPARPT 360 

FVIQPQNTEV LVGESVTLEC SATGHPPPRI SWTRGDRTPL PVDPRVNITP SGGLYIQNW 420 

QGDSGEYACS ATNNIDSVHA TAFIIVQALP QFTVTPQDRV VIEGQTVDFQ CEAKGNPPPV 480 

IAWTKGGSQL SVDRRHLVLS SGTLRISGVA LHDQGQYECQ AVNIIGSQKV VAHLTVQPRV 540 

TPVFASIPSD TTVEVGANVQ LPCSSQGEPE PAITWNKDGV QVTESGKFHI SPEGFLTIND 600 

VGPADAGRYE CVARNTIGSA SVSMVLSVNV PDVSRNGDPF VATSIVEAIA TVDRAINSTR 660 

THLFDSRPRS PNDLLALFRY PRDPYTVEQA RAGEIFERTL QLIQEHVQHG LMVDLNGTSY 720 

HYNDLVSPQY LNLIANLSGC TAHRRVNNCS DMCFHQKYRT HDGTCNNLQH PMWGASLTAF 780 

ERLLKSVYEN GFNTPRGINP HRLYNGHALP MPRLVSTTLI GTETVTPDEQ FTHMLMQWGQ 840 

FLDHDLDSTV VALSQARFSD GQHCSNVCSN DPPCFSVMIP PNDSRARSGA RCMFFVRSSP 900 

VCGSGMTSLL MNSVYPREQI NQLTSYIDAS NVYGSTEHEA RSIRDLASHR GLLRQGIVQR 960 

SGKPLLPFAT GPPTECMRDE NESPIPCFLA GDHRANEQLG LTSMHTLWFR EHNRIATELL 1020 

KLNPHWDGDT IYYETRKIVG AEIQHITYQH WLPKILGEVG MRTLGEYHGY DPGINAGIFN 1080 

AFATAAFRFG HTLVNPLLYR LDENFQPIAQ DHLPLHKAFF SPFRIVNEGG IDPLLRGLFG 1140 

VAGKMRVPSQ LLNTELTERL FSMAHTVALD LAAINIQRGR DHGIPPYHDY RVYCNLSAAH 1200 

TFEDLKNEIK NPEIREKLKR LYGSTLNIDL FPALWEDLV PGSRLGPTLM CLLSTQFKRL 1260 

RDGDRLWYEN PGVFSPAQLT QIKQTSLARI LCDNADNITR VQSDVFRVAE FPHGYGSCDE 1320 

IPRVDLRVWQ DCCEDCRTRG QFNAFSYHFR GRRSLEFSYQ EDKPTKKTRP RKIPSVGRQG 1380 

EHLSNSTSAF STRSDASGTN DFREFVLEMQ KTITDLRTQI KKLESRLSTT ECVDAGGESH 1440 
ANNTKWKKDA CTICECKDGQ VTCFVEACPP ATCAVPVNIP GACCPVCLQK RAEEKP 



ACF5 protein sequence 

Gene name: Mitogen-activated protein kinase kinase kinase kinase 4 
Unigene number: Hs.3628 
Probeset Accession #: N54067 
Protein Accession #: NP_004825.1 

Pfam: pkinase (Eukaryotic protein kinase domain) , CNH domain 

Summary: The yeast serine/ threonine kinase STE20 activates a signaling cascade 
that includes STE11 (mitogen-activated protein kinase kinase kinase) , STE7 
(mitogen-activated protein kinase kinase) , and FUS3/KSS1 (mitogen-activated protein 
kinase) in response to signals from both Cdc42 and the heterotrimeric G proteins 
associated with transmembrane pheromone receptors. ACF5 is a human cDNA encoding a 
protein kinase homologous to STE20. This protein kinase, also designated HPK/GCK- 
like kinase (HGK) , has nucleotide sequences that encode an open reading frame of 
1165 amino acids with 11 kinase subdomains. HGK is a serine/ threonine protein 
kinase that specifically activated the c-Jun N-terminal kinase (JNK) signaling 
pathway when transfected into 293T cells, but does not stimulate either the 
extracellular signal -regulated kinase or p38 kinase pathway. HGK also increased 
AP-l-mediated transcriptional activity in vivo. HGK may be a novel activator of 
the JNK pathway. The cascade may look like this: HGK -> TAK1 -> MKK4 , MKK7 -> JNK 
kinase cascade, which may mediate the TNF-alpha signaling pathway. 



MANDSPAKSL VDIDLSSLRD PAGIFELVEV VGNGTYGQVY KGRHVKTGQli AAI KVMDVTE 60 

DEEEEIKLEI NMLKKYSHHR NIATYYGAFI KKSPPGHDDQ LWLVMEFCGA GS I TDLVKNT 120 

KGNTLKEDWI AYISREILRG LAHLHIHHVI HRDIKGQNVL LTENAEVKLV DFGVSAQLDR 180 

TVGRRNTFIG TPYWMAPEVI ACDENPDATY DYRSDLWSCG ITAIEMAEGA PPLCDMHPMR 240 

ALFLIPRNPP PRLKSKKWSK KFFSFIEGCL VKNYMQRPST EQLLKHPFIR DQPNERQVRI 3 00 

QLKDHIDRTR KKRGEKDETE YEYSGSEEEE EEVPEQEGEP SSIVNVPGES TLRRDFLRLQ 3 60 

QENKERSEAL RRQQLLQEQQ LREQEEYKRQ LLAERQKRIE QQKEQRRRLE EQQRREREAR 420 

RQQEREQRRR EQEEKRRLEE LERRRKEEEE RRRAEEEKRR VEREQEYIRR QLEEEQRHLE 480 

VLQQQLLQEQ AMLLHDHRRP HPQHSQQPPP PQQERSKPSF HAPEPKAHYE PADRAREVPV 540 

RTTSRSPVLS RRDSPLQGSG QQNSQAGQRN STSIEPRLLW ERVEKLVPRP GSGSSSGSSN 600 

SGSQPGSHPG SQSGSGERFR VRSSSKSEGS PSQRLENAVK KPEDKKEVFR PLKPAGEV'*' 660 

TALAKE LRAV EDVRPPHKVT DYSSSSEESG TTDEEDDDVE QEGADESTSG PEDTRAASi^ 720 

NLSNGETESV KTMIVHDDVE SEPAMTPSKE GTLIVRQTQS ASSTLQKHKS SSSFTPFIDP 780 

RLLQISPSSG TTVTSWGFS CDGMRPEAIR QDPTRKGSW NVNPTNTRPQ SDTPEIRKYK 84 0 

KRFNSEILCA ALWGVNLLVG TESGLMLLDR SGQGKVYPLI NRRRFQQMDV LEGLNVLVTI 900 

SGKKDKLRVY YLSWLRNKIL HNDPEVEKKQ GWTTVGDLEG CVHYKWKYE RIKFLVIALK 96 0 

SSVEVYAWAP KPYHKFMAFK SFGELVHKPL LVDLTVEEGQ RLKVIYGSCA GFHAVDVDSG 1020 

SVYDIYLPTH VRKNPHSMIQ CSIKPHAIII LPNTDGMELL VCYEDEGVYV NTYGRITKDV 1080 

VLQWGEMPTS VAYIRSNQTM GWGEKAIEIR SVETGHLDGV FMHKRAQRLK FLCERNDKVF 1140 
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FASVRSGGSS QVYFMTLGRT SLLSW 



ACF8 protein sequence 

Gene name: Phospholipase A2, group IVC (cytosolic, calcium- independent ) 
Unigene number: Hs. 18858 
Probeset Accession #: AA054087 
Protein Accession #: NP__003697.1 
Pfam: none identified 

Summary: ACF8 is a membrane -bound, calcium- independent PLA2 , named cPIiA2-gamma . 
The sequence encodes a 541-amino acid protein containing a domain with significant 
homology to the catalytic domain of the 85-kDa cPLA2 (cPLA2-alpha) . cPLA2-gamma 
does not contain the regulatory calcium- dependent lipid binding (CaLB) domain found 
in cPLA2-alpha. cPLA2-gamma does contain two consensus motifs for lipid 
modification, a prenylation motif (-CCLA) at the C terminus and a myristoylation 
site at the N terminus. cPLA2 -gamma demonstrates a preference for arachidonic acid 
at the sn-2 position of phosphatidylcholine as compared with palmitic acid. cPLA2- 
gamma encodes a 3-kilobase message, which is highly expressed in heart and skeletal 
muscle, suggesting a specific role in these tissues. 



MGSSEVSIIP GLQKEEKAAV ERRRLHVLKA LKKLRIEADE APWAVLGSG GGLRAHIACL 60 

GVLSEMKEQG LLDAVTYLAG VSGSTWAISS LYTNDGDMEA LEADLKHRFT RQEWDLAKSL 120 

QKTIQAARSE NYS LTD F WAY MVISKQTREL PESHLSNMKK PVEEGTLPYP IFAAIDNDLQ 180 

PSWQEARAPE TWFEFTPHHA GFSALGAFVS ITHFGSKFKK GRLVRTHPER DLTFLRGLWG 240 

SALGNTEVIR EYIFDQLRNL TLKGLWRRAV ANAKSIGHLI FARLLRLQES SQGEHPPPED 300 

EGGEPEHTWL TEMLENWTRT SLEKQEQPHE DPERKGSLSN LMDFVKKTGI CASKWEWGTT 360 

HNFLYKHGGI RDKIMSSRKH LHLVDAGLAI NTPFPLVLPP TREVHLILSF DFSAGDPFET 420 

I RATTD YCRR HKIPFPQVEE AELDLWSKAP ASCYILKGET GPWIHFPLF NIDACGGDIE 480 
AWSDTYDTFK LADTYTLDW VLLLALAKKN VRENKKKILR ELMNVAGLYY PKDSARSCCL ' 540 
A 



ACG1 prot ein sequence 

Gene name: Carbohydrate {chondroitin 6/keratan) sulfotransf erase 1 
Unigene number: Hs. 104576 
Probeset Accession #: AA868063 
Protein Accession #: NP_003645.1 
Pfam: none identified 

Summary: Chondroitin 6-sulf otransf erase (C6ST) is the key enzyme in the 
biosynthesis of chondroitin 6 -sulfate, a glycosaminoglycan implicated in 
chondrogenesis, neoplasia, atherosclerosis, and other processes. C6ST catalyzes 
the transfer of sulfate from 3 ' -phosphoadenosine 5 • -phosphosulf ate to carbon 6 of 
the N-acetylgalactosamine residues of chondroitin. 

FHTCPGLAEA GLAERLCEES PTFAYNLSRK 60 

EPLYHVQNTL IPRFTQGKSP ADRRVMLGAS 120 

I FRRGASRVL CSRPVCDPPG PADLVLEEGD 180 

PEVNDLRALV EDPRLNLKVI QLVRDPRGIL 240 

QLTTVCEDFS NSVSTGLMRP PWLKGKYMLV 3 00 

WIQNNTRGDP TLGKHKYGTV RNSAATAEKW 3 60 
EEELKNPSVS LVEERDFRPF S 



MQCSWKAVLL 
THILILATTR 
RDLLRSLYDC 
CVRKCGLLNL 
ASRSETFRDT 
RYEDLARNPM 
RFRLS YD I VA 



LALASIAIQY 
SGSSFVGQLF 
DLYFLENYIK 
TVAAEACRER 
YRLWRLWYGT 
KKTEEIYGFL 
FAQNACQQVL 



TAIRTFTAKS 
NQHLDVFYLF 
PPPVNHTTDR 
SHVAIKTVRV 
GRKPYNLDVT 
GIPLDSHVAR 
AQLGYKIAAS 



ACG5 protein sequence 

Gene name: Multimerin 

Unigene number: Hs. 2 68 10 7 

Probeset Accession #: U27109 

Protein Accession #: AAC52065 

Sign : sequence: prediction underlined 

Pfam. EGF-like domain, Clq domain 

Summary: Multimerin is a massive, soluble protein found in platelets and in the 
endothelium of blood vessels. Multimerin is composed of varying sized, disulfide- 
linked multimers, the smallest of which is a homotrimer. Multimerin is a factor 
V/Va-binding protein and may function as a carrier protein for platelet factor V. 
Northern analyses show a 4.7-kilobase transcript in cultured endothelial cells, a 
megakaryocytic cell line, platelets, and highly vascular tissues. The multimerin 
cDNA can encode a protein of 1228 amino acids with the probable signal peptide 
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cleavage site between amino acids 19 and 20. The protein is predicted to be 
hydrophilic and to contain 23 N-glycosylation sites. The adhesive motif RGDS (Arg- 
Gly-Asp-Ser) and an epidermal growth factor-like domain were identified. 
Multimerin contains a probable coiled-coil structures in the central portion of its 
sequence. Additionally, the carboxyl- terminal region of multimerin resembles the 
globular, non- collagen- like, carboxyl -terminal domains of several other trimeric 
proteins, including complement Clq and collagens type VIII and X. 



SLQILPTTRV 


60 


KLQNLTLPTN 


120 


TGGVGNRAPR 


180 


VPGGKGPCGW 


240 


IHTNQAESHT 


300 


DVRNTYSSLE 


360 


KTVSSLSEDL 


420 


TLTCEKPIKE 


480 


NAPAAESVSN 


540 


ECEDMLSKCR 


600 


PLLEQGASLR 


660 


DALERRINEY 


720 


TILTFIPQFH 


780 


TSQVRKYQQN 


840 


DQALQLQVLN 


900 


HSLPDIQLLQ 


960 


INALKKPTVN 


1020 


FTGDNCTIKL 


1080 


TPRTGKFRIP 


1140 


ALLELNYGQE 


1200 



VWLRLAKGTI PAKFPPVTTF SGYLLYRT 



ACC6 protein sequence 

Gene name: Homo sapiens cDNA FLJ11502 fis, clone HEMBA1002102 , weakly similar to 
ANKRYIN 

Unigene number: Hs. 213194 
Probes et Accession #: AA187101 
Protein Accession #: none 
Pfam: ankyrin repeats 

VAARPPVSRM EPRAADGCFL GDVGFWVERT PVHEAAQRGE SLQLQQLIES GACVNQVTVD 60 
SITPLHAASL QGQARCVQLL LAAGAQVDAR NIDGSTPLCD ACASGSIECV KLLLSYGAKV 120 
NPPLYTASPL HEAS FPRLLS TLASTPWIN 



ACC7 protein sequence 

Gene name: Human RAL A gene 

Unigene number: Hs.6906 

Probeset Accession #: AA083572 cluster 

Protein Accession #: Pli233 

Pfam: ras 

Features: CAAX motif is underlined 

Summary: The RALA gene encodes a low molecular mass ras -like GTP -binding protein 
that shares about 50% similarity with the ras proteins. GTP-binding proteins 
mediate the transmembrane signaling initiated by the occupancy of certain cell 
surface receptors. The RALA gene maps to 7p22-pl5. 

MAANKPKGQN SLALHKVIMV GSGGVGKSAL TLQFMYDEFV EDYEPTKADS YRKKWLDGE 60 
EVQIDILDTA GQEDYAAIRD NYFRSGEGFL. CVFSITEMES FAATADFREQ ILRVKEDENV 120 
PFLLVGNKSD LEDKRQVSVE EAKNRAEQWN VNYVETSAKT RANVDKVFFD LMREIRARKM 180 
EDSKEKNGKK KRKSLAKRIR ERCC-: : 



ACC9 protein sequence 

Gene name: KIAA0955 protein 

Unigene number: Hs. 10031 

Probeset Accession #: AA027168 

Protein Accession #: BAA76799.1 

Pfam: CARD (Caspase recruitment domain) 
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Summary: Gene was originally isolated as a brain cDNA. The coding region contains 
a CARD domain, suggesting involvement in apoptotic signaling pathways. 



MMRQRQSHYC SVLFLSVNYL GGTFPGDICS EENQIVSSYA SKVCFEIEED YKNRQFLGPE 60 

5 GNVDVELIDK STNRYSVWFP TAGWYLWSAT GLGFLVRDEV TVTIAFGSWS QHLALDLQHH 120 

EQWLVGGPLF DVTAEPEEAV AEIHLPHFIS LQGEVDVSWF LVAHFKNEGM VLEHPARVEP 180 

FYAVLESPSF SLMGILLRIA SGTRLSIPIT SNTLIYYHPH PEDIKFHLYL VPSDALLTKA 240 

IDDEEDRFHG VRLQTSPPME PLNFGSSYIV SNSANLKVMP KELKLSYRSP GEIQHFSKFY 300 

AGQMKEPIQL EITEKRHGTL VWDTEVKPVD LQLVAASAPP PFSGAAFVKE NHRQLQARMG 360 

10 DLKGVLDDLQ DNEVLTENEK ELVEQEKTRQ SKNEALLSMV EKKGDLALDV LFRSISERDP 420 
YLVSYLRQQN L 

ACF6 Protein sequence 

15 Gene name: Homo sapiens cDNA FLJ10669 fis, clone NT2RP2006275 , weakly similar to 

Microtubule-associated protein IB [CONTAINS: LIGHT CHAIN LCI] 

Unigene number: Hs. 66048 

Probeset Accession #: AA609717 

Protein Accession #: BAA91743.1 
20 Pfam: none identified 

H Summary: The cDNA for FLJ10669 was originally isolated from NT2 neuronal precursor 
cells (teratocarcinoma cell line) after 2-weeks of retinoic acid (RA) treatment. 
The protein sequence has similarity to microtubule-associated protein IB (MAP- IB) , 
suggesting a function for ACF6 in the regulating the cytoskeleton. 



JSP* 



MGVGRLDMYV LHPPSAGAER TLASVCALLV WHPAGPGEKV VRVLFPGCTP PACLLDGLVR 60 

LQHLRFLREP WTPQDLEGP GRAESKESVG SRDSSKREGL LATHPRPGQE RPGVARKEPA 120 

%1 RAEAPRKTEK EAKTPRELKK DPKPSVSRTQ PREVRRAASS VPNLKKTNAQ AAPKPRKAPS 180 

TSHSGFPPVA NGPRSPPSLR CGEASPPSAA CGSPASQLVA TPSLELGPIP AGEEKALELP 240 
LAASSIPRPR TPSPESHRSP AEGSERLSLS PLRGGEAGPD ASPTVTTPTV TTPSLPAEVG ' 300 

SPHSTEVDES LSVSFEQVLP PSAPTSEAGL SLPLRGPRAR RS AS PHDVDL CLVSPCEFEH 360 

RKAVPMAPAP ASPGSSNDSS ARSQERAGGL GAEETPPTSV SESLPTLSDS DPVPLAPGAA 420 

DSDEDTEGFG VPRHDPLPDP LKVPPPLPDP SSICMVDPEM LPPKTARQTE NVSRTRKPLA 480 

RPNSRAAAPK AT P VAAAKT K GLAGGDRASR PLSARSEPSE KGGRAPLSRK SSTPKTATRG 540 

bl . PSGSASSRPG VSATPPKSPV YLDLAYLPSG SSAHLVDEEF FQRVRALCYV ISGQDQRKEE 600 

GMRAVLDALL ASKQHWDRDL QVTLIPTFDS VAMHTWYAET HARHQALGIT VLGSNGMVSM 660 
QDDAFPACKV EF 
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